com.actian.dataflow.operators.io.orc.WriteORC

All Implemented Interfaces:: LogicalOperator, RecordSinkOperator, SinkOperator<RecordPort>

public class WriteORC extends AbstractWriter

Write data in the Apache Hive ORC format. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileFormat

Field Summary

Fields

Modifier and Type

Field

Description

static final int

MIN_ROW_INDEX_STRIDE

Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter
input, options
Constructor Summary

Constructors

Constructor

Description

WriteORC()

Writes an ORC file.

WriteORC(String pattern)

Writes an ORC file to the specified location.
Method Summary

Modifier and Type

Method

Description

protected DataFormat

computeFormat(CompositionContext ctx)

Determines the data format for the target.

Boolean

getBlockPadding()

int

getBufferSize()

com.actian.dataflow.hive.shims.ORCCompression

getCompression()

Integer

getRowIndexStride()

long

getStripeSize()

com.actian.dataflow.hive.shims.ORCVersion

getVersion()

void

setBlockPadding(boolean blockPadding)

Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks.

void

setBufferSize(int bufferSize)

The size of the memory buffers used for compressing and storing the stripe in memory.

void

setCompression(com.actian.dataflow.hive.shims.ORCCompression compression)

Sets the compression mode to use within the ORC file.

void

setMode(WriteMode mode)

Sets how the writer should handle an existing target.

void

setRowIndexStride(int rowIndexStride)

Set the distance between entries in the row index.

void

setStripeSize(long stripeSize)

Set the stripe size for the file.

void

setVersion(com.actian.dataflow.hive.shims.ORCVersion version)

Sets the version of the file that will be written.

Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSink

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts

Field Details
- MIN_ROW_INDEX_STRIDE
  
  public static final int MIN_ROW_INDEX_STRIDE
  See Also:
  
  Constant Field Values
Constructor Details
- WriteORC
  
  public WriteORC()
  
  Writes an ORC file. The target must be set before execution or an error will be raised.
  See Also:
  
  AbstractWriter.setTarget(com.pervasive.datarush.operators.io.ByteSink)
- WriteORC
  
  public WriteORC(String pattern)
  
  Writes an ORC file to the specified location.
Method Details
- computeFormat
  
  protected DataFormat computeFormat(CompositionContext ctx)
  
  Description copied from class: AbstractWriter
  
  Determines the data format for the target. The returned format is used during composition to construct a WriteSink operator. If an implementation supports schema discovery, it must be performed in this method.
  
  Specified by:
  
  computeFormat in class AbstractWriter
  
  Parameters:
  
  ctx - the composition context for the current invocation of AbstractWriter.compose(CompositionContext)
  
  Returns:
  
  the target format to use
- setMode
  
  public void setMode(WriteMode mode)
  
  Sets how the writer should handle an existing target. Note that APPEND mode is currently not supported.
  
  Overrides:
  
  setMode in class AbstractWriter
  
  Parameters:
  
  mode - the behavior to use for existing files
- getCompression
  
  public com.actian.dataflow.hive.shims.ORCCompression getCompression()
- setCompression
  
  public void setCompression(com.actian.dataflow.hive.shims.ORCCompression compression)
  
  Sets the compression mode to use within the ORC file.
  
  Parameters:
  
  compression - The compression mode
- getStripeSize
  
  public long getStripeSize()
- setStripeSize
  
  public void setStripeSize(long stripeSize)
  
  Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started.
- getRowIndexStride
  
  public Integer getRowIndexStride()
- setRowIndexStride
  
  public void setRowIndexStride(int rowIndexStride)
  
  Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file.
- getBufferSize
  
  public int getBufferSize()
- setBufferSize
  
  public void setBufferSize(int bufferSize)
  
  The size of the memory buffers used for compressing and storing the stripe in memory.
- getBlockPadding
  
  public Boolean getBlockPadding()
- setBlockPadding
  
  public void setBlockPadding(boolean blockPadding)
  
  Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space.
- getVersion
  
  public com.actian.dataflow.hive.shims.ORCVersion getVersion()
- setVersion
  
  public void setVersion(com.actian.dataflow.hive.shims.ORCVersion version)
  
  Sets the version of the file that will be written.

Class WriteORC

Field Summary

Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator

Field Details

MIN_ROW_INDEX_STRIDE

Constructor Details

WriteORC

WriteORC

Method Details

computeFormat

setMode

getCompression

setCompression

getStripeSize

setStripeSize

getRowIndexStride

setRowIndexStride

getBufferSize

setBufferSize

getBlockPadding

setBlockPadding

getVersion

setVersion