public abstract class KeyOperator extends CompositeOperator
HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.
A cell is uniquely identified by an index key {row, column, version}:
The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.
DataRush field(s) can be mapped to the row id via the following methods:
1. mapRow(java.lang.String) - Map the row id as a single field. A single Field mapped as the row id will
serialize/deserialize the following DataRush data types using common HBase formats:
2. mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>) - Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id
as a record of fields using default DataRush serialization exclusively.
The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.
The version portion of the index key is the timestamp when the cell was created/changed.
DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to.
All DataRush schema information is stored in an HBase table named: catalogTableName.
WriteHBase,
ReadHBase,
DeleteHBase| Modifier and Type | Field and Description |
|---|---|
static String |
catalogTableName
The name of the HBase table containing DataRush schema for all tables.
|
static String |
cellSchemaFamilyName
The name of the family containing cell/family mapped schemas for all tables.
|
static String |
keySchemaFamilyName
The name of the family containing row key and family-qualifier key schemas for all tables.
|
static String |
statsFamilyName
The name of the family containing table information for all tables.
|
| Constructor and Description |
|---|
KeyOperator() |
| Modifier and Type | Method and Description |
|---|---|
void |
addFamily(String familyName)
Add family.
|
HadoopConfiguration |
effectiveConfiguration()
Get the effective configuration including override properties.
|
HadoopConfiguration |
getConfiguration()
Get the configuration.
|
Set<String> |
getFamilies()
Get families.
|
String |
getFilesystem()
Get default filesystem.
|
String |
getHiveMetastore()
Get Hive metastore.
|
LinkedHashMap<String,String> |
getQualifierFieldMap() |
String |
getRootDirectory()
Get HBase root directory.
|
LinkedHashMap<String,String> |
getRowFieldMap() |
String |
getTableName()
Get HBase table name.
|
String |
getTimeFieldName()
Get version timestamp field name.
|
String |
getZookeeperParentZNode()
Get Zookeeper parent znode.
|
String |
getZookeeperPort()
Get Zookeeper client port.
|
String |
getZookeeperQuorum()
Get Zookeeper quorum.
|
void |
mapQualifier(String fieldName)
Map Qualifier key as a field.
|
void |
mapRow(String fieldName)
Map Row key as a field.
|
void |
mapRowRecord(LinkedHashMap<String,String> fieldMap)
Map Row key as a record.
|
void |
setConfiguration(HadoopConfiguration configuration)
Set the configuration.
|
void |
setFamilies(Set<String> families)
Set families.
|
void |
setFilesystem(String filesystem)
Set default filesystem.
|
void |
setHiveMetastore(String metastore)
Set Hive metastore.
|
void |
setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap) |
void |
setRootDirectory(String directory)
Set HBase root directory.
|
void |
setRowFieldMap(LinkedHashMap<String,String> rowFieldMap) |
void |
setTableName(String tableName)
Set HBase table name.
|
void |
setTimeFieldName(String timeFieldName)
Set version timestamp field name.
|
void |
setZookeeperParentZNode(String znode)
Set Zookeeper parent znode.
|
void |
setZookeeperPort(String port)
Set Zookeeper client port.
|
void |
setZookeeperQuorum(String quorum)
Set Zookeeper quorum.
|
composedisableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorpublic static final String catalogTableName
cellSchemaFamilyName is the name of the family containing cell schemas.keySchemaFamilyName is the name of the family containing row key and family-qualifier key schemas.statsFamilyName is the name of the family containing table information.public static final String cellSchemaFamilyName
Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
public static final String keySchemaFamilyName
Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
public static final String statsFamilyName
public LinkedHashMap<String,String> getRowFieldMap()
public void setRowFieldMap(LinkedHashMap<String,String> rowFieldMap)
public LinkedHashMap<String,String> getQualifierFieldMap()
public void setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap)
public String getTableName()
public void setTableName(String tableName)
tableName - table name.public void mapRowRecord(LinkedHashMap<String,String> fieldMap)
fieldMap - row key record schema name to DataRush field name map.public void mapRow(String fieldName)
fieldName - DataRush field name for row key.public void mapQualifier(String fieldName)
fieldName - DataRush field name for qualifier key.public String getTimeFieldName()
public void setTimeFieldName(String timeFieldName)
timeFieldName - name of field containing the version timestamp.public HadoopConfiguration getConfiguration()
public void setConfiguration(HadoopConfiguration configuration)
Optional property. Defaults to configuration found on class path.
configuration - configuration.public HadoopConfiguration effectiveConfiguration()
public void addFamily(String familyName)
familyName - family namepublic void setFamilies(Set<String> families)
families - set of mapped family namespublic String getFilesystem()
public void setFilesystem(String filesystem)
filesystem - default filesystempublic String getZookeeperQuorum()
public void setZookeeperQuorum(String quorum)
quorum - Zookeeper quorumpublic String getZookeeperPort()
public void setZookeeperPort(String port)
port - Zookeeper client portpublic String getRootDirectory()
public void setRootDirectory(String directory)
directory - HBase root directorypublic String getHiveMetastore()
public void setHiveMetastore(String metastore)
metastore - Hive metastorepublic String getZookeeperParentZNode()
public void setZookeeperParentZNode(String znode)
znode - Zookeeper parent znodeCopyright © 2021 Actian Corporation. All rights reserved.