public abstract class KeyOperator extends CompositeOperator
HBase stores data in a table as variable length rows of individual cells which can be independently versioned/changed over time. All versions of a cell's history are retained until a maximum cell age or a maximum version count is reached. The rows are partitioned into regions across multiple server nodes. Regions consist of row key ranges and are partitioned such that all cells associated with a particular row will not span a region.
A cell is uniquely identified by an index key {row, column, version}:
The row portion of the index key is the most significant portion of the cell index key. HBase treats the row id as a series of bytes.
DataRush field(s) can be mapped to the row id via the following methods:
1. mapRow(java.lang.String)
- Map the row id as a single field. A single Field mapped as the row id will
serialize/deserialize the following DataRush data types using common HBase formats:
2. mapRowRecord(java.util.LinkedHashMap<java.lang.String, java.lang.String>)
- Map the row id as a record of fields. Mapping multiple fields will serialize/deserialize the row id
as a record of fields using default DataRush serialization exclusively.
The column portion of the index key consists of 2 parts: a column family name, and a column qualifier. The column family name identifies one of multiple families created at table creation time. Column families provide a way to logically and physically group cells such that cells associated with a particular family are stored together in the same files on disk. The column qualifier uniquely identifies a cell (and previous versions) within a column family.
The version portion of the index key is the timestamp when the cell was created/changed.
DataRush schema is persisted in HBase describing row key and family-qualifier key types for any table that DataRush writes to.
All DataRush schema information is stored in an HBase table named: catalogTableName
.
WriteHBase
,
ReadHBase
,
DeleteHBase
Modifier and Type | Field and Description |
---|---|
static String |
catalogTableName
The name of the HBase table containing DataRush schema for all tables.
|
static String |
cellSchemaFamilyName
The name of the family containing cell/family mapped schemas for all tables.
|
static String |
keySchemaFamilyName
The name of the family containing row key and family-qualifier key schemas for all tables.
|
static String |
statsFamilyName
The name of the family containing table information for all tables.
|
Constructor and Description |
---|
KeyOperator() |
Modifier and Type | Method and Description |
---|---|
void |
addFamily(String familyName)
Add family.
|
HadoopConfiguration |
effectiveConfiguration()
Get the effective configuration including override properties.
|
HadoopConfiguration |
getConfiguration()
Get the configuration.
|
Set<String> |
getFamilies()
Get families.
|
String |
getFilesystem()
Get default filesystem.
|
String |
getHiveMetastore()
Get Hive metastore.
|
LinkedHashMap<String,String> |
getQualifierFieldMap() |
String |
getRootDirectory()
Get HBase root directory.
|
LinkedHashMap<String,String> |
getRowFieldMap() |
String |
getTableName()
Get HBase table name.
|
String |
getTimeFieldName()
Get version timestamp field name.
|
String |
getZookeeperParentZNode()
Get Zookeeper parent znode.
|
String |
getZookeeperPort()
Get Zookeeper client port.
|
String |
getZookeeperQuorum()
Get Zookeeper quorum.
|
void |
mapQualifier(String fieldName)
Map Qualifier key as a field.
|
void |
mapRow(String fieldName)
Map Row key as a field.
|
void |
mapRowRecord(LinkedHashMap<String,String> fieldMap)
Map Row key as a record.
|
void |
setConfiguration(HadoopConfiguration configuration)
Set the configuration.
|
void |
setFamilies(Set<String> families)
Set families.
|
void |
setFilesystem(String filesystem)
Set default filesystem.
|
void |
setHiveMetastore(String metastore)
Set Hive metastore.
|
void |
setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap) |
void |
setRootDirectory(String directory)
Set HBase root directory.
|
void |
setRowFieldMap(LinkedHashMap<String,String> rowFieldMap) |
void |
setTableName(String tableName)
Set HBase table name.
|
void |
setTimeFieldName(String timeFieldName)
Set version timestamp field name.
|
void |
setZookeeperParentZNode(String znode)
Set Zookeeper parent znode.
|
void |
setZookeeperPort(String port)
Set Zookeeper client port.
|
void |
setZookeeperQuorum(String quorum)
Set Zookeeper quorum.
|
compose
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public static final String catalogTableName
cellSchemaFamilyName
is the name of the family containing cell schemas.keySchemaFamilyName
is the name of the family containing row key and family-qualifier key schemas.statsFamilyName
is the name of the family containing table information.public static final String cellSchemaFamilyName
Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
public static final String keySchemaFamilyName
Schema is persisted in the cell as a JSON string format of a DataRush RecordTokenType, serialized using org.apache.hadoop.hbase.util.Bytes.toBytes(String). The following utility methods in DataRush can be used to convert to/from JSON string format:
public static final String statsFamilyName
public LinkedHashMap<String,String> getRowFieldMap()
public void setRowFieldMap(LinkedHashMap<String,String> rowFieldMap)
public LinkedHashMap<String,String> getQualifierFieldMap()
public void setQualifierFieldMap(LinkedHashMap<String,String> qualifierFieldMap)
public String getTableName()
public void setTableName(String tableName)
tableName
- table name.public void mapRowRecord(LinkedHashMap<String,String> fieldMap)
fieldMap
- row key record schema name to DataRush field name map.public void mapRow(String fieldName)
fieldName
- DataRush field name for row key.public void mapQualifier(String fieldName)
fieldName
- DataRush field name for qualifier key.public String getTimeFieldName()
public void setTimeFieldName(String timeFieldName)
timeFieldName
- name of field containing the version timestamp.public HadoopConfiguration getConfiguration()
public void setConfiguration(HadoopConfiguration configuration)
Optional property. Defaults to configuration found on class path.
configuration
- configuration.public HadoopConfiguration effectiveConfiguration()
public void addFamily(String familyName)
familyName
- family namepublic void setFamilies(Set<String> families)
families
- set of mapped family namespublic String getFilesystem()
public void setFilesystem(String filesystem)
filesystem
- default filesystempublic String getZookeeperQuorum()
public void setZookeeperQuorum(String quorum)
quorum
- Zookeeper quorumpublic String getZookeeperPort()
public void setZookeeperPort(String port)
port
- Zookeeper client portpublic String getRootDirectory()
public void setRootDirectory(String directory)
directory
- HBase root directorypublic String getHiveMetastore()
public void setHiveMetastore(String metastore)
metastore
- Hive metastorepublic String getZookeeperParentZNode()
public void setZookeeperParentZNode(String znode)
znode
- Zookeeper parent znodeCopyright © 2020 Actian Corporation. All rights reserved.