- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractWriter
-
- com.actian.dataflow.operators.io.orc.WriteORC
-
- All Implemented Interfaces:
LogicalOperator
,RecordSinkOperator
,SinkOperator<RecordPort>
public class WriteORC extends AbstractWriter
Write data in the Apache Hive ORC format. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileFormat
-
-
Field Summary
Fields Modifier and Type Field Description static int
MIN_ROW_INDEX_STRIDE
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter
input, options
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the target.Boolean
getBlockPadding()
int
getBufferSize()
com.actian.dataflow.hive.shims.ORCCompression
getCompression()
Integer
getRowIndexStride()
long
getStripeSize()
com.actian.dataflow.hive.shims.ORCVersion
getVersion()
void
setBlockPadding(boolean blockPadding)
Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks.void
setBufferSize(int bufferSize)
The size of the memory buffers used for compressing and storing the stripe in memory.void
setCompression(com.actian.dataflow.hive.shims.ORCCompression compression)
Sets the compression mode to use within the ORC file.void
setMode(WriteMode mode)
Sets how the writer should handle an existing target.void
setRowIndexStride(int rowIndexStride)
Set the distance between entries in the row index.void
setStripeSize(long stripeSize)
Set the stripe size for the file.void
setVersion(com.actian.dataflow.hive.shims.ORCVersion version)
Sets the version of the file that will be written.-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSink
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Field Detail
-
MIN_ROW_INDEX_STRIDE
public static final int MIN_ROW_INDEX_STRIDE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
WriteORC
public WriteORC()
Writes an ORC file. The target must be set before execution or an error will be raised.
-
WriteORC
public WriteORC(String pattern)
Writes an ORC file to the specified location.
-
-
Method Detail
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractWriter
Determines the data format for the target. The returned format is used during composition to construct aWriteSink
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractWriter
- Parameters:
ctx
- the composition context for the current invocation ofAbstractWriter.compose(CompositionContext)
- Returns:
- the target format to use
-
setMode
public void setMode(WriteMode mode)
Sets how the writer should handle an existing target. Note that APPEND mode is currently not supported.- Overrides:
setMode
in classAbstractWriter
- Parameters:
mode
- the behavior to use for existing files
-
getCompression
public com.actian.dataflow.hive.shims.ORCCompression getCompression()
-
setCompression
public void setCompression(com.actian.dataflow.hive.shims.ORCCompression compression)
Sets the compression mode to use within the ORC file.- Parameters:
compression
- The compression mode
-
getStripeSize
public long getStripeSize()
-
setStripeSize
public void setStripeSize(long stripeSize)
Set the stripe size for the file. The writer stores the contents of the stripe in memory until this memory limit is reached and the stripe is flushed to the HDFS file and the next stripe started.
-
getRowIndexStride
public Integer getRowIndexStride()
-
setRowIndexStride
public void setRowIndexStride(int rowIndexStride)
Set the distance between entries in the row index. The minimum value is 1000 to prevent the index from overwhelming the data. If the stride is set to 0, no indexes will be included in the file.
-
getBufferSize
public int getBufferSize()
-
setBufferSize
public void setBufferSize(int bufferSize)
The size of the memory buffers used for compressing and storing the stripe in memory.
-
getBlockPadding
public Boolean getBlockPadding()
-
setBlockPadding
public void setBlockPadding(boolean blockPadding)
Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Padding improves locality and thus the speed of reading, but costs space.
-
getVersion
public com.actian.dataflow.hive.shims.ORCVersion getVersion()
-
setVersion
public void setVersion(com.actian.dataflow.hive.shims.ORCVersion version)
Sets the version of the file that will be written.
-
-