java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.CompositeOperator
com.pervasive.datarush.operators.io.AbstractWriter
com.actian.dataflow.operators.io.orc.WriteORC
- All Implemented Interfaces:
LogicalOperator,RecordSinkOperator,SinkOperator<RecordPort>
Write data in the Apache Hive ORC format.
See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileFormat
-
Field Summary
FieldsFields inherited from class com.pervasive.datarush.operators.io.AbstractWriter
input, options -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected DataFormatDetermines the data format for the target.intcom.actian.dataflow.hive.shims.ORCCompressionlongcom.actian.dataflow.hive.shims.ORCVersionvoidsetBlockPadding(boolean blockPadding) Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks.voidsetBufferSize(int bufferSize) The size of the memory buffers used for compressing and storing the stripe in memory.voidsetCompression(com.actian.dataflow.hive.shims.ORCCompression compression) Sets the compression mode to use within the ORC file.voidSets how the writer should handle an existing target.voidsetRowIndexStride(int rowIndexStride) Set the distance between entries in the row index.voidsetStripeSize(long stripeSize) Set the stripe size for the file.voidsetVersion(com.actian.dataflow.hive.shims.ORCVersion version) Sets the version of the file that will be written.Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSinkMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Field Details
-
MIN_ROW_INDEX_STRIDE
public static final int MIN_ROW_INDEX_STRIDE- See Also:
-
-
Constructor Details
-
WriteORC
public WriteORC()Writes an ORC file. The target must be set before execution or an error will be raised. -
WriteORC
Writes an ORC file to the specified location.
-
-
Method Details
-
computeFormat
Description copied from class:AbstractWriterDetermines the data format for the target. The returned format is used during composition to construct aWriteSinkoperator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormatin classAbstractWriter- Parameters:
ctx- the composition context for the current invocation ofAbstractWriter.compose(CompositionContext)- Returns:
- the target format to use
-
setMode
Sets how the writer should handle an existing target. Note that APPEND mode is currently not supported.- Overrides:
setModein classAbstractWriter- Parameters:
mode- the behavior to use for existing files
-
getCompression
public com.actian.dataflow.hive.shims.ORCCompression getCompression() -
setCompression
public void setCompression(com.actian.dataflow.hive.shims.ORCCompression compression) Sets the compression mode to use within the ORC file.- Parameters:
compression- The compression mode
-
getStripeSize
public long getStripeSize() -
setStripeSize
public void setStripeSize(long stripeSize) Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started. -
getRowIndexStride
-
setRowIndexStride
public void setRowIndexStride(int rowIndexStride) Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file. -
getBufferSize
public int getBufferSize() -
setBufferSize
public void setBufferSize(int bufferSize) The size of the memory buffers used for compressing and storing the stripe in memory. -
getBlockPadding
-
setBlockPadding
public void setBlockPadding(boolean blockPadding) Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space. -
getVersion
public com.actian.dataflow.hive.shims.ORCVersion getVersion() -
setVersion
public void setVersion(com.actian.dataflow.hive.shims.ORCVersion version) Sets the version of the file that will be written.
-