Class WriteStagingDataset
- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractWriter
-
- com.pervasive.datarush.operators.io.staging.WriteStagingDataset
-
- All Implemented Interfaces:
LogicalOperator
,RecordSinkOperator
,SinkOperator<RecordPort>
public final class WriteStagingDataset extends AbstractWriter
Writes a sequence of records to disk in an internal format for staged data. Staged data sets are useful as they are more efficient than text files, being stored in a compact binary format. If a set of data must be read multiple times, significant savings can be achieved by converting it into a data set first.It is generally best to perform parallel writes, creating multiple files. This allows reads to be parallelized effectively, as the staging format is not splitable.
- See Also:
WriteStagingDataset
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter
input, options
-
-
Constructor Summary
Constructors Constructor Description WriteStagingDataset()
Writes to an empty target with default settings.WriteStagingDataset(boolean provideDoneSignal)
Writes an empty target with default settings, optionally providing a port for signaling completion of the write.WriteStagingDataset(Path path, WriteMode mode)
Writes to the specified path in the given mode, using default options.WriteStagingDataset(ByteSink target, WriteMode mode)
Writes to the specified target sink using default options.WriteStagingDataset(String path, WriteMode mode)
Writes to the specified path in the given mode, using default options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the target.DatasetMetadata
discoverMetadata(FileClient client)
Gets the metadata for the currently configured data target.int
getBlockSize()
Gets the block size, in rows, used for encoding data.DatasetStorageFormat
getFormat()
Gets the data set format used to store datavoid
setBlockSize(int blockSize)
Sets the block size, in rows, used for encoding data.void
setFormat(DatasetStorageFormat format)
Sets the data set format used to store data.-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setMode, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSink
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
WriteStagingDataset
public WriteStagingDataset()
Writes to an empty target with default settings. The target must be set before execution or an error will be raised.- See Also:
AbstractWriter.setTarget(ByteSink)
-
WriteStagingDataset
public WriteStagingDataset(boolean provideDoneSignal)
Writes an empty target with default settings, optionally providing a port for signaling completion of the write. The target must be set before execution or an error will be raised.- Parameters:
provideDoneSignal
- indicates whether a done signal port should be created- See Also:
AbstractWriter.setTarget(ByteSink)
-
WriteStagingDataset
public WriteStagingDataset(String path, WriteMode mode)
Writes to the specified path in the given mode, using default options.If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
- Parameters:
path
- the path to which to writemode
- how to handle existing files
-
WriteStagingDataset
public WriteStagingDataset(Path path, WriteMode mode)
Writes to the specified path in the given mode, using default options.If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
- Parameters:
path
- the path to which to writemode
- how to handle existing files
-
WriteStagingDataset
public WriteStagingDataset(ByteSink target, WriteMode mode)
Writes to the specified target sink using default options.The writer can only be parallelized if the sink is fragmentable. In this case, each partition will be written as an independent sink. Otherwise, the writer will run non-parallel.
- Parameters:
target
- the sink to which to writemode
- how to handle an existing sink
-
-
Method Detail
-
setFormat
public void setFormat(DatasetStorageFormat format)
Sets the data set format used to store data. By default, this isDatasetStorageFormat.COMPACT_ROW
.- Parameters:
format
- the format to use when writing
-
getFormat
public DatasetStorageFormat getFormat()
Gets the data set format used to store data- Returns:
- the format to use when writing
-
getBlockSize
public int getBlockSize()
Gets the block size, in rows, used for encoding data.- Returns:
- the size of encoded data blocks
-
setBlockSize
public void setBlockSize(int blockSize)
Sets the block size, in rows, used for encoding data. By default, this is 64 rows. This setting is of most importance forDatasetStorageFormat.COLUMNAR
.Using larger values may increase efficiency, but at a cost of using more memory.
- Parameters:
blockSize
- the size of encoded data blocks
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractWriter
Determines the data format for the target. The returned format is used during composition to construct aWriteSink
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractWriter
- Parameters:
ctx
- the composition context for the current invocation ofAbstractWriter.compose(CompositionContext)
- Returns:
- the target format to use
-
discoverMetadata
public DatasetMetadata discoverMetadata(FileClient client)
Gets the metadata for the currently configured data target.- Parameters:
client
- the file client- Returns:
- the metadata of the target
-
-