- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.hbase.KeyOperator
-
- All Implemented Interfaces:
LogicalOperator
- Direct Known Subclasses:
DeleteHBase
,KeyValueOperator
public abstract class KeyOperator extends CompositeOperator
Specifies key field mapping when accessing HBase.HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.
A cell is uniquely identified by an index key {row, column, version}:
The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.
DataRush field(s) can be mapped to the row id via the following methods:
1.
mapRow(java.lang.String)
- Map the row id as a single field. A single Field mapped as the row id will serialize/deserialize the following DataRush data types using common HBase formats:- TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
- TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
- TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
- TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
- TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
- TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
- TokenTypeConstant.BINARY - store/retrieve byte array as is
- All other data types will be serialized/deserialized using the default DataRush formats.
2.
mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>)
- Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id as a record of fields using default DataRush serialization exclusively.The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.
The version portion of the index key is the timestamp when the cell was created/changed.
DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to. All DataRush schema information is stored in an HBase table named:
catalogTableName
.- See Also:
WriteHBase
,ReadHBase
,DeleteHBase
-
-
Field Summary
Fields Modifier and Type Field Description static String
catalogTableName
The name of the HBase table containing DataRush schema for all tables.static String
cellSchemaFamilyName
The name of the family containing cell/family mapped schemas for all tables.static String
keySchemaFamilyName
The name of the family containing row key and family-qualifier key schemas for all tables.static String
statsFamilyName
The name of the family containing table information for all tables.
-
Constructor Summary
Constructors Constructor Description KeyOperator()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addFamily(String familyName)
Add family.HadoopConfiguration
effectiveConfiguration()
Get the effective configuration including override properties.HadoopConfiguration
getConfiguration()
Get the configuration.Set<String>
getFamilies()
Get families.String
getFilesystem()
Get default filesystem.String
getHiveMetastore()
Get Hive metastore.LinkedHashMap<String,String>
getQualifierFieldMap()
String
getRootDirectory()
Get HBase root directory.LinkedHashMap<String,String>
getRowFieldMap()
String
getTableName()
Get HBase table name.String
getTimeFieldName()
Get version timestamp field name.String
getZookeeperParentZNode()
Get Zookeeper parent znode.String
getZookeeperPort()
Get Zookeeper client port.String
getZookeeperQuorum()
Get Zookeeper quorum.void
mapQualifier(String fieldName)
Map Qualifier key as a field.void
mapRow(String fieldName)
Map Row key as a field.void
mapRowRecord(LinkedHashMap<String,String> fieldMap)
Map Row key as a record.void
setConfiguration(HadoopConfiguration configuration)
Set the configuration.void
setFamilies(Set<String> families)
Set families.void
setFilesystem(String filesystem)
Set default filesystem.void
setHiveMetastore(String metastore)
Set Hive metastore.void
setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap)
void
setRootDirectory(String directory)
Set HBase root directory.void
setRowFieldMap(LinkedHashMap<String,String> rowFieldMap)
void
setTableName(String tableName)
Set HBase table name.void
setTimeFieldName(String timeFieldName)
Set version timestamp field name.void
setZookeeperParentZNode(String znode)
Set Zookeeper parent znode.void
setZookeeperPort(String port)
Set Zookeeper client port.void
setZookeeperQuorum(String quorum)
Set Zookeeper quorum.-
Methods inherited from class com.pervasive.datarush.operators.CompositeOperator
compose
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
-
-
-
Field Detail
-
catalogTableName
public static final String catalogTableName
The name of the HBase table containing DataRush schema for all tables. The row id for this table is the table name thus this table contains a row for each table written to by DataRush. This HBase table has the following families defined:cellSchemaFamilyName
is the name of the family containing cell schemas.keySchemaFamilyName
is the name of the family containing row key and family-qualifier key schemas.statsFamilyName
is the name of the family containing table information.
- See Also:
- Constant Field Values
-
cellSchemaFamilyName
public static final String cellSchemaFamilyName
The name of the family containing cell/family mapped schemas for all tables. The qualifier for this family is the family name for a family mapped schema. For cell mapped schemas the qualifier is the family name concatenated with the qualifier, separated by a colon ("family:qualifier").Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
- com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
- com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
- See Also:
- Constant Field Values
-
keySchemaFamilyName
public static final String keySchemaFamilyName
The name of the family containing row key and family-qualifier key schemas for all tables. The qualifier for this family is the family name for a family mapped schema. The row key schema is stored with an empty string "" qualifier.Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
- com.pervasive.datarush.tokens.types.TypeUtil.fromJSON(TokenType)
- com.pervasive.datarush.tokens.types.TypeUtil.toJSON(TokenType)
- See Also:
- Constant Field Values
-
statsFamilyName
public static final String statsFamilyName
The name of the family containing table information for all tables.- See Also:
- Constant Field Values
-
-
Method Detail
-
getRowFieldMap
public LinkedHashMap<String,String> getRowFieldMap()
-
setRowFieldMap
public void setRowFieldMap(LinkedHashMap<String,String> rowFieldMap)
-
getQualifierFieldMap
public LinkedHashMap<String,String> getQualifierFieldMap()
-
setQualifierFieldMap
public void setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap)
-
getTableName
public String getTableName()
Get HBase table name.- Returns:
- table name.
-
setTableName
public void setTableName(String tableName)
Set HBase table name. Table name is required.- Parameters:
tableName
- table name.
-
mapRowRecord
public void mapRowRecord(LinkedHashMap<String,String> fieldMap)
Map Row key as a record. This method is used to map the row key as a Record. Any previous mapping for the row key is replaced.- Parameters:
fieldMap
- row key record schema name to DataRush field name map.
-
mapRow
public void mapRow(String fieldName)
Map Row key as a field. This method is used to map the row key as a field. Any previous mapping for the row key is replaced.- Parameters:
fieldName
- DataRush field name for row key.
-
mapQualifier
public void mapQualifier(String fieldName)
Map Qualifier key as a field. This method is used to map the qualifier key as a field. Any previous mapping for the qualifier key is replaced.- Parameters:
fieldName
- DataRush field name for qualifier key.
-
getTimeFieldName
public String getTimeFieldName()
Get version timestamp field name.- Returns:
- name of field containing version timestamp.
-
setTimeFieldName
public void setTimeFieldName(String timeFieldName)
Set version timestamp field name. This method is optionally used to designate a version timestamp field.- Parameters:
timeFieldName
- name of field containing the version timestamp.
-
getConfiguration
public HadoopConfiguration getConfiguration()
Get the configuration.- Returns:
- configuration.
-
setConfiguration
public void setConfiguration(HadoopConfiguration configuration)
Set the configuration. Set the configuration to be used to locate HBase tables, HDFS resources, etc.Optional property. Defaults to configuration found on class path.
- Parameters:
configuration
- configuration.
-
effectiveConfiguration
public HadoopConfiguration effectiveConfiguration()
Get the effective configuration including override properties.- Returns:
- configuration.
-
addFamily
public void addFamily(String familyName)
Add family.- Parameters:
familyName
- family name
-
setFamilies
public void setFamilies(Set<String> families)
Set families.- Parameters:
families
- set of mapped family names
-
getFilesystem
public String getFilesystem()
Get default filesystem.- Returns:
- default filesystem
-
setFilesystem
public void setFilesystem(String filesystem)
Set default filesystem. This method allows for overriding the HBase configuration property: . example: "hdfs://namenode:port"- Parameters:
filesystem
- default filesystem
-
getZookeeperQuorum
public String getZookeeperQuorum()
Get Zookeeper quorum.- Returns:
- Zookeeper quorum
-
setZookeeperQuorum
public void setZookeeperQuorum(String quorum)
Set Zookeeper quorum. This method allows for overriding the HBase configuration property: "hbase.zookeeper.quorum". example: "node1,node2,..."- Parameters:
quorum
- Zookeeper quorum
-
getZookeeperPort
public String getZookeeperPort()
Get Zookeeper client port.- Returns:
- Zookeeper client port
-
setZookeeperPort
public void setZookeeperPort(String port)
Set Zookeeper client port. This method allows for overriding the HBase configuration property: "hbase.zookeeper.property.clientPort". example: "2181"- Parameters:
port
- Zookeeper client port
-
getRootDirectory
public String getRootDirectory()
Get HBase root directory.- Returns:
- HBase root directory
-
setRootDirectory
public void setRootDirectory(String directory)
Set HBase root directory. This method allows for overriding the HBase configuration property: "hbase.rootdir". example: "hdfs://namenode:port/path"- Parameters:
directory
- HBase root directory
-
getHiveMetastore
public String getHiveMetastore()
Get Hive metastore.- Returns:
- Hive metastore
-
setHiveMetastore
public void setHiveMetastore(String metastore)
Set Hive metastore. This method allows for overriding the HBase configuration property: "hive.metastore.uris". example: "thrift://namenode:port"- Parameters:
metastore
- Hive metastore
-
getZookeeperParentZNode
public String getZookeeperParentZNode()
Get Zookeeper parent znode.- Returns:
- Zookeeper parent znode
-
setZookeeperParentZNode
public void setZookeeperParentZNode(String znode)
Set Zookeeper parent znode. This method allows for overriding the HBase configuration property: "zookeeper.znode.parent". example: "/hbase"- Parameters:
znode
- Zookeeper parent znode
-
-