Class WriteHBase

  • All Implemented Interfaces:
    LogicalOperator

    public class WriteHBase
    extends KeyValueOperator
    Write a result set to HBase.

    If the user specifies a row key field then the input will be repartitioned using HBase table region row key ranges. Each partition will sort its rows in row key ascending, qualifier key ascending, time key descending order, and then write the rows to the appropriate regions.

    If the user does not specify a row key field then the input will be written to regions local to the partition. No repartitioning nor sorting is performed. A unique row key of type TokenTypeConstant.BINARY will be generated for each input record.

    The generated keys are guaranteed to be evenly distributed and unique for time periods exceeding hundreds of years. The generated key consists of 2 parts:

    • 8 bytes - partitioned random range counter(auto-incremented for each input row)
    • 4 bytes - bulk load counter(auto-incremented at the start of each DataRush load)

    A unique qualifier key will also be generated for each input record if the user maps a family as a sub-table and does not specify a qualifier key field. The qualifier key is generated in a similar manor.

    A time key field can optionally be specified to allow the user to provide a timestamp value as part of the input record. If a time key field is not specified then each record will default to current time. In BOTH cases the timestamp value is narrowed to millisecond resolution (to match HBase) and will be advanced slightly (1 millisecond per duplicate) to uniquely identify cells with duplicate row key, qualifier key, and time key values. This makes the import tolerant of duplicate cell versions in the input stream as long as they occur infrequently. Importing a large number of duplicate cell versions (> thousands of duplicate-keys/second) with duplicate row key, qualifier key, and time key values may result in significant time skew and IO fragmentation, in order to maintain uniqueness.

    If the specified HBase table does not exist then it will be created. The number of regions created will be MAX(4, the level of parallelism).

    See Also:
    DeleteHBase, ReadHBase
    • Constructor Detail

      • WriteHBase

        public WriteHBase()
    • Method Detail

      • getOutputPath

        public String getOutputPath()
        Get the output path.
        Returns:
        output path.
      • setOutputPath

        public void setOutputPath​(String outputPath)
        Set the output path. Set the location used to create HFiles. The Hfiles will be input to HBase by renaming if the same file system as HBase, by copying if a different file system than HBase. All files and directories created during the load process will be deleted upon completion.

        Optional property. Defaults to hdfs:/user/userName/

        Parameters:
        outputPath - the output path.
      • compose

        public void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context