Class KeyOperator

  • All Implemented Interfaces:
    LogicalOperator
    Direct Known Subclasses:
    DeleteHBase, KeyValueOperator

    public abstract class KeyOperator
    extends CompositeOperator
    Specifies key field mapping when accessing HBase.

    HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.

    A cell is uniquely identified by an index key {row, column, version}:

    The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.

    DataRush field(s) can be mapped to the row id via the following methods:

    1. mapRow(java.lang.String) - Map the row id as a single field. A single Field mapped as the row id will serialize/deserialize the following DataRush data types using common HBase formats:

    • TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
    • TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
    • TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
    • TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
    • TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
    • TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
    • TokenTypeConstant.BINARY - store/retrieve byte array as is
    • All other data types will be serialized/deserialized using the default DataRush formats.

    2. mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>) - Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id as a record of fields using default DataRush serialization exclusively.

    The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.

    The version portion of the index key is the timestamp when the cell was created/changed.

    DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to. All DataRush schema information is stored in an HBase table named: catalogTableName.

    See Also:
    WriteHBase, ReadHBase, DeleteHBase
    • Field Detail

      • catalogTableName

        public static final String catalogTableName
        The name of the HBase table containing DataRush schema for all tables. The row id for this table is the table name thus this table contains a row for each table written to by DataRush. This HBase table has the following families defined:
        See Also:
        Constant Field Values
      • cellSchemaFamilyName

        public static final String cellSchemaFamilyName
        The name of the family containing cell/family mapped schemas for all tables. The qualifier for this family is the family name for a family mapped schema. For cell mapped schemas the qualifier is the family name concatenated with the qualifier, separated by a colon ("family:qualifier").

        Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:

        • com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
        • com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
        See Also:
        Constant Field Values
      • keySchemaFamilyName

        public static final String keySchemaFamilyName
        The name of the family containing row key and family-qualifier key schemas for all tables. The qualifier for this family is the family name for a family mapped schema. The row key schema is stored with an empty string "" qualifier.

        Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:

        • com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
        • com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
        See Also:
        Constant Field Values
      • statsFamilyName

        public static final String statsFamilyName
        The name of the family containing table information for all tables.
        See Also:
        Constant Field Values
    • Constructor Detail

      • KeyOperator

        public KeyOperator()
    • Method Detail

      • getTableName

        public String getTableName()
        Get HBase table name.
        Returns:
        table name.
      • setTableName

        public void setTableName​(String tableName)
        Set HBase table name. Table name is required.
        Parameters:
        tableName - table name.
      • mapRowRecord

        public void mapRowRecord​(LinkedHashMap<String,​String> fieldMap)
        Map Row key as a record. This method is used to map the row key as a Record. Any previous mapping for the row key is replaced.
        Parameters:
        fieldMap - row key record schema name to DataRush field name map.
      • mapRow

        public void mapRow​(String fieldName)
        Map Row key as a field. This method is used to map the row key as a field. Any previous mapping for the row key is replaced.
        Parameters:
        fieldName - DataRush field name for row key.
      • mapQualifier

        public void mapQualifier​(String fieldName)
        Map Qualifier key as a field. This method is used to map the qualifier key as a field. Any previous mapping for the qualifier key is replaced.
        Parameters:
        fieldName - DataRush field name for qualifier key.
      • getTimeFieldName

        public String getTimeFieldName()
        Get version timestamp field name.
        Returns:
        name of field containing version timestamp.
      • setTimeFieldName

        public void setTimeFieldName​(String timeFieldName)
        Set version timestamp field name. This method is optionally used to designate a version timestamp field.
        Parameters:
        timeFieldName - name of field containing the version timestamp.
      • getConfiguration

        public HadoopConfiguration getConfiguration()
        Get the configuration.
        Returns:
        configuration.
      • setConfiguration

        public void setConfiguration​(HadoopConfiguration configuration)
        Set the configuration. Set the configuration to be used to locate HBase tables, HDFS resources, etc.

        Optional property. Defaults to configuration found on class path.

        Parameters:
        configuration - configuration.
      • effectiveConfiguration

        public HadoopConfiguration effectiveConfiguration()
        Get the effective configuration including override properties.
        Returns:
        configuration.
      • addFamily

        public void addFamily​(String familyName)
        Add family.
        Parameters:
        familyName - family name
      • getFamilies

        public Set<String> getFamilies()
        Get families.
        Returns:
        set of mapped family names
      • setFamilies

        public void setFamilies​(Set<String> families)
        Set families.
        Parameters:
        families - set of mapped family names
      • getFilesystem

        public String getFilesystem()
        Get default filesystem.
        Returns:
        default filesystem
      • setFilesystem

        public void setFilesystem​(String filesystem)
        Set default filesystem. This method allows for overriding the HBase configuration property: . example: "hdfs://namenode:port"
        Parameters:
        filesystem - default filesystem
      • getZookeeperQuorum

        public String getZookeeperQuorum()
        Get Zookeeper quorum.
        Returns:
        Zookeeper quorum
      • setZookeeperQuorum

        public void setZookeeperQuorum​(String quorum)
        Set Zookeeper quorum. This method allows for overriding the HBase configuration property: "hbase.zookeeper.quorum". example: "node1,node2,..."
        Parameters:
        quorum - Zookeeper quorum
      • getZookeeperPort

        public String getZookeeperPort()
        Get Zookeeper client port.
        Returns:
        Zookeeper client port
      • setZookeeperPort

        public void setZookeeperPort​(String port)
        Set Zookeeper client port. This method allows for overriding the HBase configuration property: "hbase.zookeeper.property.clientPort". example: "2181"
        Parameters:
        port - Zookeeper client port
      • getRootDirectory

        public String getRootDirectory()
        Get HBase root directory.
        Returns:
        HBase root directory
      • setRootDirectory

        public void setRootDirectory​(String directory)
        Set HBase root directory. This method allows for overriding the HBase configuration property: "hbase.rootdir". example: "hdfs://namenode:port/path"
        Parameters:
        directory - HBase root directory
      • getHiveMetastore

        public String getHiveMetastore()
        Get Hive metastore.
        Returns:
        Hive metastore
      • setHiveMetastore

        public void setHiveMetastore​(String metastore)
        Set Hive metastore. This method allows for overriding the HBase configuration property: "hive.metastore.uris". example: "thrift://namenode:port"
        Parameters:
        metastore - Hive metastore
      • getZookeeperParentZNode

        public String getZookeeperParentZNode()
        Get Zookeeper parent znode.
        Returns:
        Zookeeper parent znode
      • setZookeeperParentZNode

        public void setZookeeperParentZNode​(String znode)
        Set Zookeeper parent znode. This method allows for overriding the HBase configuration property: "zookeeper.znode.parent". example: "/hbase"
        Parameters:
        znode - Zookeeper parent znode