Class KeyValueOperator

  • All Implemented Interfaces:
    LogicalOperator
    Direct Known Subclasses:
    ReadHBase, WriteHBase

    public abstract class KeyValueOperator
    extends KeyOperator
    Specifies data field mapping when accessing HBase.

    A DataRush Field can be mapped to a select HBase cell or it can be mapped to all cells in a family as a sub table:

    1. mapCell(java.lang.String, java.lang.String, java.lang.String) - Map a select cell within a column family as a field. Cells within a family can be of heterogeneous type and only the mapped cells are accessed. All mapped cell fields will be together in a single record. Optional row key, and time key fields can be specified to uniquely id each DataRush record.

    2. mapFamily(java.lang.String, java.lang.String) - Map all cells within a column family as a sub table of fields. All cells within a family are of homogeneous type and all cells are accessed. Each cell within a family is contained in an individual record. Optional row key, qualifier key, and time key fields can be specified to uniquely id each DataRush record.

    Mapping cells such that each cell contains a single field will serialize/deserialize the following DataRush data types using common HBase formats:

    • TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
    • TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
    • TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
    • TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
    • TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
    • TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
    • TokenTypeConstant.BINARY - store/retrieve byte array as is
    • All other data types will be serialized/deserialized using the default DataRush formats.

    An HBase cell can also be mapped as a record of fields:

    1. mapCellRecord(java.lang.String, java.lang.String, java.util.Map<java.lang.String, java.lang.String>) - Map a select cell within a column family as a record. Cells within a family can be of heterogeneous type and only the mapped cells are accessed. All mapped cell fields will be together in a single record.Optional row key, and time key fields can be specified to uniquely id each DataRush record.

    2. mapFamilyRecord(java.lang.String, java.util.Map<java.lang.String, java.lang.String>) - Map all cells within a column family as a sub table of records. All cells within a family are of homogeneous type and all cells are accessed. Each cell within a family is contained in an individual record. Optional row key, qualifier key, and time key fields can be specified to uniquely id each DataRush record.

    Mapping a single HBase cell as a record of fields will allow multiple fields to be packed into a single cell thus greatly increasing io performance at the expense of reduced version granularity. All fields packed together in a single cell are versioned together and therefore all fields must be present when writing. The default DataRush serialization is used exclusively in this case.

    DataRush schema is persisted in HBase for any cell that DataRush writes to.

    If a DataRush schema exists:

    • DataRush read operations will return requested keys and/or fields as schema data type.
    • DataRush write operations will check for type compatibility.

    If a DataRush schema does not exist:

    • a cell can only be mapped as a single field for reading.
    • DataRush read operations will return the keys and/or fields as TokenTypeConstant.BINARY.
    • DataRush write operations will create a schema upon first write.
    See Also:
    WriteHBase, ReadHBase, DeleteHBase
    • Constructor Detail

      • KeyValueOperator

        public KeyValueOperator()
    • Method Detail

      • getFamilyFieldMap

        public Map<String,​Map<String,​String>> getFamilyFieldMap()
        Get the column family to field mapping.
        Returns:
        column family to field mapping
      • setFamilyFieldMap

        public void setFamilyFieldMap​(Map<String,​Map<String,​String>> familyFieldMap)
        Set the column family to field mapping.
        Parameters:
        familyFieldMap - column family field mapping
      • setCellFieldMap

        public void setCellFieldMap​(Map<String,​Map<String,​Map<String,​String>>> cellFieldMap)
        Set the HBase cell to field mapping.
        Parameters:
        cellFieldMap - cell to field mapping
      • mapCellRecord

        public void mapCellRecord​(String familyName,
                                  String qualifier,
                                  Map<String,​String> fieldMap)
        Map Cell as a Record. This method is used to map a select Cell as a Record. Any previous cell mapping for the specified family, qualifier is replaced.
        Parameters:
        familyName - the name of the HBase column family.
        qualifier - the HBase column qualifier identifying the cell to be mapped as a Record.
        fieldMap - cell record schema name to DataRush field name map.
      • mapCell

        public void mapCell​(String familyName,
                            String qualifier,
                            String fieldName)
        Map Cell as a Field. This method is used to map a select cell as a field. Any previous cell mapping for the specified family, qualifier is replaced.
        Parameters:
        familyName - the name of the HBase column family.
        qualifier - the HBase column qualifier identifying the cell to be mapped as a Field.
        fieldName - the field name to be mapped to the specified cell qualifier.
      • mapFamilyRecord

        public void mapFamilyRecord​(String familyName,
                                    Map<String,​String> fieldMap)
        Map family as a sub-table of records. This method is used to map all cells within a family as a sub-table of records. Any previous mapping for the specified family is replaced.
        Parameters:
        familyName - the name of the HBase column family.
        fieldMap - cell record schema name to DataRush field name map.
      • mapFamily

        public void mapFamily​(String familyName,
                              String fieldName)
        Map family as a sub-table of fields. This method is used to map all cells within a family as a sub-table of fields. Any previous mapping for the specified family is replaced.
        Parameters:
        familyName - the name of the HBase column family.
        fieldName - the field name to be mapped to the specified family.
      • getHCatalogDatabase

        public String getHCatalogDatabase()
        Get the HCatalog database from which to retrieve a schema and mapping.
        Returns:
        the HCatalog database
      • setHCatalogDatabase

        public void setHCatalogDatabase​(String database)
        Set the HCatalog database from which to retrieve a schema and mapping.
        Parameters:
        database - the HCatalog database
      • getHCatalogTable

        public String getHCatalogTable()
        Get the HCatalog table from which to retrieve a schema and mapping.
        Returns:
        the HCatalog table
      • setHCatalogTable

        public void setHCatalogTable​(String table)
        Set the HCatalog table from which to retrieve a schema and mapping.
        Parameters:
        table - the HCatalog table
      • getHCatalogFields

        public List<String> getHCatalogFields()
        Get the HCatalog fields to read or write.
        Returns:
        the a list of field names, or null if not set
      • setHCatalogFields

        public void setHCatalogFields​(List<String> fields)
        Set the HCatalog fields to read or write. If not set, all defined fields will be read or written.
        Parameters:
        fields - a list of field names
      • setHCatalogFields

        public void setHCatalogFields​(String... fields)
        Set the HCatalog fields to read or write. If not set, all defined fields will be read or written.
        Parameters:
        fields - a list of field names
      • mapFromHCatalog

        protected com.pervasive.datarush.hadoop.shims.hbase.TableSchema mapFromHCatalog​(MetadataContext ctx)
        Load a mapping from an existing HCatalog table.
        Parameters:
        ctx -
      • mapToHCatalog

        protected void mapToHCatalog​(MetadataContext ctx,
                                     RecordPort input)
        Write a mapping to a new HCatalog table.
        Parameters:
        ctx -
      • tableExistsInHCatalog

        protected boolean tableExistsInHCatalog​(MetadataContext ctx)
        Determines if the currently selected HCatalog table already exists.
        Parameters:
        ctx -
        Returns:
        true if the table exists, false otherwise
      • schemaSupportedByHCatalog

        protected boolean schemaSupportedByHCatalog​(MetadataContext ctx)
        Determines if the current schema can be written to HCatalog.

        A schema is supported by HCatalog if its only mappings are cells to fields.

        Parameters:
        ctx -
        Returns:
        true