All Implemented Interfaces:
LogicalOperator, RecordSinkOperator, SinkOperator<RecordPort>

public class WriteORC extends AbstractWriter
Write data in the Apache Hive ORC format. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileFormat
  • Field Details

  • Constructor Details

  • Method Details

    • computeFormat

      protected DataFormat computeFormat(CompositionContext ctx)
      Description copied from class: AbstractWriter
      Determines the data format for the target. The returned format is used during composition to construct a WriteSink operator. If an implementation supports schema discovery, it must be performed in this method.
      Specified by:
      computeFormat in class AbstractWriter
      Parameters:
      ctx - the composition context for the current invocation of AbstractWriter.compose(CompositionContext)
      Returns:
      the target format to use
    • setMode

      public void setMode(WriteMode mode)
      Sets how the writer should handle an existing target. Note that APPEND mode is currently not supported.
      Overrides:
      setMode in class AbstractWriter
      Parameters:
      mode - the behavior to use for existing files
    • getCompression

      public com.actian.dataflow.hive.shims.ORCCompression getCompression()
    • setCompression

      public void setCompression(com.actian.dataflow.hive.shims.ORCCompression compression)
      Sets the compression mode to use within the ORC file.
      Parameters:
      compression - The compression mode
    • getStripeSize

      public long getStripeSize()
    • setStripeSize

      public void setStripeSize(long stripeSize)
      Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started.
    • getRowIndexStride

      public Integer getRowIndexStride()
    • setRowIndexStride

      public void setRowIndexStride(int rowIndexStride)
      Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file.
    • getBufferSize

      public int getBufferSize()
    • setBufferSize

      public void setBufferSize(int bufferSize)
      The size of the memory buffers used for compressing and storing the stripe in memory.
    • getBlockPadding

      public Boolean getBlockPadding()
    • setBlockPadding

      public void setBlockPadding(boolean blockPadding)
      Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space.
    • getVersion

      public com.actian.dataflow.hive.shims.ORCVersion getVersion()
    • setVersion

      public void setVersion(com.actian.dataflow.hive.shims.ORCVersion version)
      Sets the version of the file that will be written.