Class KeyOperator

All Implemented Interfaces:
LogicalOperator
Direct Known Subclasses:
DeleteHBase, KeyValueOperator

public abstract class KeyOperator extends CompositeOperator
Specifies key field mapping when accessing HBase.

HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.

A cell is uniquely identified by an index key {row, column, version}:

The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.

DataRush field(s) can be mapped to the row id via the following methods:

1. mapRow(java.lang.String) - Map the row id as a single field. A single Field mapped as the row id will serialize/deserialize the following DataRush data types using common HBase formats:

  • TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
  • TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
  • TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
  • TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
  • TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
  • TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
  • TokenTypeConstant.BINARY - store/retrieve byte array as is
  • All other data types will be serialized/deserialized using the default DataRush formats.

2. mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>) - Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id as a record of fields using default DataRush serialization exclusively.

The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.

The version portion of the index key is the timestamp when the cell was created/changed.

DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to. All DataRush schema information is stored in an HBase table named: catalogTableName.

See Also:
  • Field Details

    • catalogTableName

      public static final String catalogTableName
      The name of the HBase table containing DataRush schema for all tables. The row id for this table is the table name thus this table contains a row for each table written to by DataRush. This HBase table has the following families defined:
      See Also:
    • cellSchemaFamilyName

      public static final String cellSchemaFamilyName
      The name of the family containing cell/family mapped schemas for all tables. The qualifier for this family is the family name for a family mapped schema. For cell mapped schemas the qualifier is the family name concatenated with the qualifier, separated by a colon ("family:qualifier").

      Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:

      • com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
      • com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
      See Also:
    • keySchemaFamilyName

      public static final String keySchemaFamilyName
      The name of the family containing row key and family-qualifier key schemas for all tables. The qualifier for this family is the family name for a family mapped schema. The row key schema is stored with an empty string "" qualifier.

      Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:

      • com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
      • com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
      See Also:
    • statsFamilyName

      public static final String statsFamilyName
      The name of the family containing table information for all tables.
      See Also:
  • Constructor Details

    • KeyOperator

      public KeyOperator()
  • Method Details

    • getRowFieldMap

      public LinkedHashMap<String,String> getRowFieldMap()
    • setRowFieldMap

      public void setRowFieldMap(LinkedHashMap<String,String> rowFieldMap)
    • getQualifierFieldMap

      public LinkedHashMap<String,String> getQualifierFieldMap()
    • setQualifierFieldMap

      public void setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap)
    • getTableName

      public String getTableName()
      Get HBase table name.
      Returns:
      table name.
    • setTableName

      public void setTableName(String tableName)
      Set HBase table name. Table name is required.
      Parameters:
      tableName - table name.
    • mapRowRecord

      public void mapRowRecord(LinkedHashMap<String,String> fieldMap)
      Map Row key as a record. This method is used to map the row key as a Record. Any previous mapping for the row key is replaced.
      Parameters:
      fieldMap - row key record schema name to DataRush field name map.
    • mapRow

      public void mapRow(String fieldName)
      Map Row key as a field. This method is used to map the row key as a field. Any previous mapping for the row key is replaced.
      Parameters:
      fieldName - DataRush field name for row key.
    • mapQualifier

      public void mapQualifier(String fieldName)
      Map Qualifier key as a field. This method is used to map the qualifier key as a field. Any previous mapping for the qualifier key is replaced.
      Parameters:
      fieldName - DataRush field name for qualifier key.
    • getTimeFieldName

      public String getTimeFieldName()
      Get version timestamp field name.
      Returns:
      name of field containing version timestamp.
    • setTimeFieldName

      public void setTimeFieldName(String timeFieldName)
      Set version timestamp field name. This method is optionally used to designate a version timestamp field.
      Parameters:
      timeFieldName - name of field containing the version timestamp.
    • getConfiguration

      public HadoopConfiguration getConfiguration()
      Get the configuration.
      Returns:
      configuration.
    • setConfiguration

      public void setConfiguration(HadoopConfiguration configuration)
      Set the configuration. Set the configuration to be used to locate HBase tables, HDFS resources, etc.

      Optional property. Defaults to configuration found on class path.

      Parameters:
      configuration - configuration.
    • effectiveConfiguration

      public HadoopConfiguration effectiveConfiguration()
      Get the effective configuration including override properties.
      Returns:
      configuration.
    • addFamily

      public void addFamily(String familyName)
      Add family.
      Parameters:
      familyName - family name
    • getFamilies

      public Set<String> getFamilies()
      Get families.
      Returns:
      set of mapped family names
    • setFamilies

      public void setFamilies(Set<String> families)
      Set families.
      Parameters:
      families - set of mapped family names
    • getFilesystem

      public String getFilesystem()
      Get default filesystem.
      Returns:
      default filesystem
    • setFilesystem

      public void setFilesystem(String filesystem)
      Set default filesystem. This method allows for overriding the HBase configuration property: . example: "hdfs://namenode:port"
      Parameters:
      filesystem - default filesystem
    • getZookeeperQuorum

      public String getZookeeperQuorum()
      Get Zookeeper quorum.
      Returns:
      Zookeeper quorum
    • setZookeeperQuorum

      public void setZookeeperQuorum(String quorum)
      Set Zookeeper quorum. This method allows for overriding the HBase configuration property: "hbase.zookeeper.quorum". example: "node1,node2,..."
      Parameters:
      quorum - Zookeeper quorum
    • getZookeeperPort

      public String getZookeeperPort()
      Get Zookeeper client port.
      Returns:
      Zookeeper client port
    • setZookeeperPort

      public void setZookeeperPort(String port)
      Set Zookeeper client port. This method allows for overriding the HBase configuration property: "hbase.zookeeper.property.clientPort". example: "2181"
      Parameters:
      port - Zookeeper client port
    • getRootDirectory

      public String getRootDirectory()
      Get HBase root directory.
      Returns:
      HBase root directory
    • setRootDirectory

      public void setRootDirectory(String directory)
      Set HBase root directory. This method allows for overriding the HBase configuration property: "hbase.rootdir". example: "hdfs://namenode:port/path"
      Parameters:
      directory - HBase root directory
    • getHiveMetastore

      public String getHiveMetastore()
      Get Hive metastore.
      Returns:
      Hive metastore
    • setHiveMetastore

      public void setHiveMetastore(String metastore)
      Set Hive metastore. This method allows for overriding the HBase configuration property: "hive.metastore.uris". example: "thrift://namenode:port"
      Parameters:
      metastore - Hive metastore
    • getZookeeperParentZNode

      public String getZookeeperParentZNode()
      Get Zookeeper parent znode.
      Returns:
      Zookeeper parent znode
    • setZookeeperParentZNode

      public void setZookeeperParentZNode(String znode)
      Set Zookeeper parent znode. This method allows for overriding the HBase configuration property: "zookeeper.znode.parent". example: "/hbase"
      Parameters:
      znode - Zookeeper parent znode