-
- All Implemented Interfaces:
LogicalOperator
public class WriteHBase extends KeyValueOperator
Write a result set to HBase.If the user specifies a row key field then the input will be repartitioned using HBase table region row key ranges. Each partition will sort its rows in row key ascending, qualifier key ascending, time key descending order, and then write the rows to the appropriate regions.
If the user does not specify a row key field then the input will be written to regions local to the partition. No repartitioning nor sorting is performed. A unique row key of type TokenTypeConstant.BINARY will be generated for each input record.
The generated keys are guaranteed to be evenly distributed and unique for time periods exceeding hundreds of years. The generated key consists of 2 parts:
- 8 bytes - partitioned random range counter(auto-incremented for each input row)
- 4 bytes - bulk load counter(auto-incremented at the start of each DataRush load)
A unique qualifier key will also be generated for each input record if the user maps a family as a sub-table and does not specify a qualifier key field. The qualifier key is generated in a similar manor.
A time key field can optionally be specified to allow the user to provide a timestamp value as part of the input record. If a time key field is not specified then each record will default to current time. In BOTH cases the timestamp value is narrowed to millisecond resolution (to match HBase) and will be advanced slightly (1 millisecond per duplicate) to uniquely identify cells with duplicate row key, qualifier key, and time key values. This makes the import tolerant of duplicate cell versions in the input stream as long as they occur infrequently. Importing a large number of duplicate cell versions (> thousands of duplicate-keys/second) with duplicate row key, qualifier key, and time key values may result in significant time skew and IO fragmentation, in order to maintain uniqueness.
If the specified HBase table does not exist then it will be created. The number of regions created will be MAX(4, the level of parallelism).
- See Also:
DeleteHBase
,ReadHBase
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.hbase.KeyOperator
catalogTableName, cellSchemaFamilyName, keySchemaFamilyName, statsFamilyName
-
-
Constructor Summary
Constructors Constructor Description WriteHBase()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
compose(CompositionContext ctx)
Compose the body of this operator.RecordPort
getInput()
String
getOutputPath()
Get the output path.void
setOutputPath(String outputPath)
Set the output path.-
Methods inherited from class com.pervasive.datarush.hbase.KeyValueOperator
getCellFieldMap, getFamilyFieldMap, getHCatalogDatabase, getHCatalogFields, getHCatalogTable, mapCell, mapCell, mapCellRecord, mapFamily, mapFamily, mapFamilyRecord, mapFromHCatalog, mapToHCatalog, schemaSupportedByHCatalog, setCellFieldMap, setFamilyFieldMap, setHCatalogDatabase, setHCatalogFields, setHCatalogFields, setHCatalogTable, tableExistsInHCatalog
-
Methods inherited from class com.pervasive.datarush.hbase.KeyOperator
addFamily, effectiveConfiguration, getConfiguration, getFamilies, getFilesystem, getHiveMetastore, getQualifierFieldMap, getRootDirectory, getRowFieldMap, getTableName, getTimeFieldName, getZookeeperParentZNode, getZookeeperPort, getZookeeperQuorum, mapQualifier, mapRow, mapRowRecord, setConfiguration, setFamilies, setFilesystem, setHiveMetastore, setQualifierFieldMap, setRootDirectory, setRowFieldMap, setTableName, setTimeFieldName, setZookeeperParentZNode, setZookeeperPort, setZookeeperQuorum
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
-
-
-
Method Detail
-
getOutputPath
public String getOutputPath()
Get the output path.- Returns:
- output path.
-
setOutputPath
public void setOutputPath(String outputPath)
Set the output path. Set the location used to create HFiles. The Hfiles will be input to HBase by renaming if the same file system as HBase, by copying if a different file system than HBase. All files and directories created during the load process will be deleted upon completion.Optional property. Defaults to hdfs:/user/userName/
- Parameters:
outputPath
- the output path.
-
getInput
public RecordPort getInput()
-
compose
public void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-