- All Implemented Interfaces:
LogicalOperator
- Direct Known Subclasses:
DeleteHBase,KeyValueOperator
HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.
A cell is uniquely identified by an index key {row, column, version}:
The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.
DataRush field(s) can be mapped to the row id via the following methods:
1. mapRow(java.lang.String) - Map the row id as a single field. A single Field mapped as the row id will
serialize/deserialize the following DataRush data types using common HBase formats:
- TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
- TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
- TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
- TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
- TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
- TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
- TokenTypeConstant.BINARY - store/retrieve byte array as is
- All other data types will be serialized/deserialized using the default DataRush formats.
2. mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>) - Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id
as a record of fields using default DataRush serialization exclusively.
The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.
The version portion of the index key is the timestamp when the cell was created/changed.
DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to.
All DataRush schema information is stored in an HBase table named: catalogTableName.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe name of the HBase table containing DataRush schema for all tables.static final StringThe name of the family containing cell/family mapped schemas for all tables.static final StringThe name of the family containing row key and family-qualifier key schemas for all tables.static final StringThe name of the family containing table information for all tables. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdd family.Get the effective configuration including override properties.Get the configuration.Get families.Get default filesystem.Get Hive metastore.Get HBase root directory.Get HBase table name.Get version timestamp field name.Get Zookeeper parent znode.Get Zookeeper client port.Get Zookeeper quorum.voidmapQualifier(String fieldName) Map Qualifier key as a field.voidMap Row key as a field.voidmapRowRecord(LinkedHashMap<String, String> fieldMap) Map Row key as a record.voidsetConfiguration(HadoopConfiguration configuration) Set the configuration.voidsetFamilies(Set<String> families) Set families.voidsetFilesystem(String filesystem) Set default filesystem.voidsetHiveMetastore(String metastore) Set Hive metastore.voidsetQualifierFieldMap(LinkedHashMap<String, String> qualifierFieldMap) voidsetRootDirectory(String directory) Set HBase root directory.voidsetRowFieldMap(LinkedHashMap<String, String> rowFieldMap) voidsetTableName(String tableName) Set HBase table name.voidsetTimeFieldName(String timeFieldName) Set version timestamp field name.voidsetZookeeperParentZNode(String znode) Set Zookeeper parent znode.voidsetZookeeperPort(String port) Set Zookeeper client port.voidsetZookeeperQuorum(String quorum) Set Zookeeper quorum.Methods inherited from class com.pervasive.datarush.operators.CompositeOperator
composeMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Field Details
-
catalogTableName
The name of the HBase table containing DataRush schema for all tables. The row id for this table is the table name thus this table contains a row for each table written to by DataRush. This HBase table has the following families defined:cellSchemaFamilyNameis the name of the family containing cell schemas.keySchemaFamilyNameis the name of the family containing row key and family-qualifier key schemas.statsFamilyNameis the name of the family containing table information.
- See Also:
-
cellSchemaFamilyName
The name of the family containing cell/family mapped schemas for all tables. The qualifier for this family is the family name for a family mapped schema. For cell mapped schemas the qualifier is the family name concatenated with the qualifier, separated by a colon ("family:qualifier").Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
- com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
- com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
- See Also:
-
keySchemaFamilyName
The name of the family containing row key and family-qualifier key schemas for all tables. The qualifier for this family is the family name for a family mapped schema. The row key schema is stored with an empty string "" qualifier.Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
- com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
- com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
- See Also:
-
statsFamilyName
The name of the family containing table information for all tables.- See Also:
-
-
Constructor Details
-
KeyOperator
public KeyOperator()
-
-
Method Details
-
getRowFieldMap
-
setRowFieldMap
-
getQualifierFieldMap
-
setQualifierFieldMap
-
getTableName
Get HBase table name.- Returns:
- table name.
-
setTableName
Set HBase table name. Table name is required.- Parameters:
tableName- table name.
-
mapRowRecord
Map Row key as a record. This method is used to map the row key as a Record. Any previous mapping for the row key is replaced.- Parameters:
fieldMap- row key record schema name to DataRush field name map.
-
mapRow
Map Row key as a field. This method is used to map the row key as a field. Any previous mapping for the row key is replaced.- Parameters:
fieldName- DataRush field name for row key.
-
mapQualifier
Map Qualifier key as a field. This method is used to map the qualifier key as a field. Any previous mapping for the qualifier key is replaced.- Parameters:
fieldName- DataRush field name for qualifier key.
-
getTimeFieldName
Get version timestamp field name.- Returns:
- name of field containing version timestamp.
-
setTimeFieldName
Set version timestamp field name. This method is optionally used to designate a version timestamp field.- Parameters:
timeFieldName- name of field containing the version timestamp.
-
getConfiguration
Get the configuration.- Returns:
- configuration.
-
setConfiguration
Set the configuration. Set the configuration to be used to locate HBase tables, HDFS resources, etc.Optional property. Defaults to configuration found on class path.
- Parameters:
configuration- configuration.
-
effectiveConfiguration
Get the effective configuration including override properties.- Returns:
- configuration.
-
addFamily
Add family.- Parameters:
familyName- family name
-
getFamilies
Get families.- Returns:
- set of mapped family names
-
setFamilies
Set families.- Parameters:
families- set of mapped family names
-
getFilesystem
Get default filesystem.- Returns:
- default filesystem
-
setFilesystem
Set default filesystem. This method allows for overriding the HBase configuration property: . example: "hdfs://namenode:port"- Parameters:
filesystem- default filesystem
-
getZookeeperQuorum
Get Zookeeper quorum.- Returns:
- Zookeeper quorum
-
setZookeeperQuorum
Set Zookeeper quorum. This method allows for overriding the HBase configuration property: "hbase.zookeeper.quorum". example: "node1,node2,..."- Parameters:
quorum- Zookeeper quorum
-
getZookeeperPort
Get Zookeeper client port.- Returns:
- Zookeeper client port
-
setZookeeperPort
Set Zookeeper client port. This method allows for overriding the HBase configuration property: "hbase.zookeeper.property.clientPort". example: "2181"- Parameters:
port- Zookeeper client port
-
getRootDirectory
Get HBase root directory.- Returns:
- HBase root directory
-
setRootDirectory
Set HBase root directory. This method allows for overriding the HBase configuration property: "hbase.rootdir". example: "hdfs://namenode:port/path"- Parameters:
directory- HBase root directory
-
getHiveMetastore
Get Hive metastore.- Returns:
- Hive metastore
-
setHiveMetastore
Set Hive metastore. This method allows for overriding the HBase configuration property: "hive.metastore.uris". example: "thrift://namenode:port"- Parameters:
metastore- Hive metastore
-
getZookeeperParentZNode
Get Zookeeper parent znode.- Returns:
- Zookeeper parent znode
-
setZookeeperParentZNode
Set Zookeeper parent znode. This method allows for overriding the HBase configuration property: "zookeeper.znode.parent". example: "/hbase"- Parameters:
znode- Zookeeper parent znode
-