public class WriteHBase extends KeyValueOperator
If the user specifies a row key field then the input will be repartitioned using HBase table region row key ranges. Each partition will sort its rows in row key ascending, qualifier key ascending, time key descending order, and then write the rows to the appropriate regions.
If the user does not specify a row key field then the input will be written to regions local to the partition. No repartitioning nor sorting is performed. A unique row key of type TokenTypeConstant.BINARY will be generated for each input record.
The generated keys are guaranteed to be evenly distributed and unique for time periods exceeding hundreds of years. The generated key consists of 2 parts:
A unique qualifier key will also be generated for each input record if the user maps a family as a sub-table and does not specify a qualifier key field. The qualifier key is generated in a similar manor.
A time key field can optionally be specified to allow the user to provide a timestamp value as part of the input record. If a time key field is not specified then each record will default to current time. In BOTH cases the timestamp value is narrowed to millisecond resolution (to match HBase) and will be advanced slightly (1 millisecond per duplicate) to uniquely identify cells with duplicate row key, qualifier key, and time key values. This makes the import tolerant of duplicate cell versions in the input stream as long as they occur infrequently. Importing a large number of duplicate cell versions (> thousands of duplicate-keys/second) with duplicate row key, qualifier key, and time key values may result in significant time skew and IO fragmentation, in order to maintain uniqueness.
If the specified HBase table does not exist then it will be created. The number of regions created will be MAX(4, the level of parallelism).
DeleteHBase
,
ReadHBase
catalogTableName, cellSchemaFamilyName, keySchemaFamilyName, statsFamilyName
Constructor and Description |
---|
WriteHBase() |
Modifier and Type | Method and Description |
---|---|
void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
RecordPort |
getInput() |
String |
getOutputPath()
Get the output path.
|
void |
setOutputPath(String outputPath)
Set the output path.
|
getCellFieldMap, getFamilyFieldMap, getHCatalogDatabase, getHCatalogFields, getHCatalogTable, mapCell, mapCell, mapCellRecord, mapFamily, mapFamily, mapFamilyRecord, mapFromHCatalog, mapToHCatalog, schemaSupportedByHCatalog, setCellFieldMap, setFamilyFieldMap, setHCatalogDatabase, setHCatalogFields, setHCatalogFields, setHCatalogTable, tableExistsInHCatalog
addFamily, effectiveConfiguration, getConfiguration, getFamilies, getFilesystem, getHiveMetastore, getQualifierFieldMap, getRootDirectory, getRowFieldMap, getTableName, getTimeFieldName, getZookeeperParentZNode, getZookeeperPort, getZookeeperQuorum, mapQualifier, mapRow, mapRowRecord, setConfiguration, setFamilies, setFilesystem, setHiveMetastore, setQualifierFieldMap, setRootDirectory, setRowFieldMap, setTableName, setTimeFieldName, setZookeeperParentZNode, setZookeeperPort, setZookeeperQuorum
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public String getOutputPath()
public void setOutputPath(String outputPath)
Optional property. Defaults to hdfs:/user/userName/
outputPath
- the output path.public RecordPort getInput()
public void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2021 Actian Corporation. All rights reserved.