java.lang.Object
- com.pervasive.datarush.operators.AbstractLogicalOperator
- - com.pervasive.datarush.operators.CompositeOperator
  - - com.pervasive.datarush.operators.io.AbstractWriter
    - - com.pervasive.datarush.operators.io.staging.WriteStagingDataset

All Implemented Interfaces:

LogicalOperator, RecordSinkOperator, SinkOperator<RecordPort>
```
public final class WriteStagingDataset
extends AbstractWriter
```
Writes a sequence of records to disk in an internal format for staged data. Staged data sets are useful as they are more efficient than text files, being stored in a compact binary format. If a set of data must be read multiple times, significant savings can be achieved by converting it into a data set first.
It is generally best to perform parallel writes, creating multiple files. This allows reads to be parallelized effectively, as the staging format is not splitable.

See Also:

WriteStagingDataset

Field Summary
- Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter
  input, options

Constructor Summary

Constructors
Constructor	Description
`WriteStagingDataset()`	Writes to an empty target with default settings.
`WriteStagingDataset(boolean provideDoneSignal)`	Writes an empty target with default settings, optionally providing a port for signaling completion of the write.
`WriteStagingDataset(Path path, WriteMode mode)`	Writes to the specified path in the given mode, using default options.
`WriteStagingDataset(ByteSink target, WriteMode mode)`	Writes to the specified target sink using default options.
`WriteStagingDataset(String path, WriteMode mode)`	Writes to the specified path in the given mode, using default options.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected DataFormat`	`computeFormat(CompositionContext ctx)`	Determines the data format for the target.
`DatasetMetadata`	`discoverMetadata(FileClient client)`	Gets the metadata for the currently configured data target.
`int`	`getBlockSize()`	Gets the block size, in rows, used for encoding data.
`DatasetStorageFormat`	`getFormat()`	Gets the data set format used to store data
`void`	`setBlockSize(int blockSize)`	Sets the block size, in rows, used for encoding data.
`void`	`setFormat(DatasetStorageFormat format)`	Sets the data set format used to store data.

Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setMode, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSink

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts

- Constructor Detail
  - WriteStagingDataset
```
public WriteStagingDataset()
```
    Writes to an empty target with default settings. The target must be set before execution or an error will be raised.
    
    See Also:
    
    AbstractWriter.setTarget(ByteSink)
  - WriteStagingDataset
```
public WriteStagingDataset(boolean provideDoneSignal)
```
    Writes an empty target with default settings, optionally providing a port for signaling completion of the write. The target must be set before execution or an error will be raised.
    
    Parameters:
    
    provideDoneSignal - indicates whether a done signal port should be created
    
    See Also:
    
    AbstractWriter.setTarget(ByteSink)
  - WriteStagingDataset
```
public WriteStagingDataset(String path,
                           WriteMode mode)
```
    Writes to the specified path in the given mode, using default options.
    If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
    
    Parameters:
    
    path - the path to which to write
    
    mode - how to handle existing files
  - WriteStagingDataset
```
public WriteStagingDataset(Path path,
                           WriteMode mode)
```
    Writes to the specified path in the given mode, using default options.
    If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
    
    Parameters:
    
    path - the path to which to write
    
    mode - how to handle existing files
  - WriteStagingDataset
```
public WriteStagingDataset(ByteSink target,
                           WriteMode mode)
```
    Writes to the specified target sink using default options.
    The writer can only be parallelized if the sink is fragmentable. In this case, each partition will be written as an independent sink. Otherwise, the writer will run non-parallel.
    
    Parameters:
    
    target - the sink to which to write
    
    mode - how to handle an existing sink
- Method Detail
  - setFormat
```
public void setFormat(DatasetStorageFormat format)
```
    Sets the data set format used to store data. By default, this is DatasetStorageFormat.COMPACT_ROW.
    
    Parameters:
    
    format - the format to use when writing
  - getFormat
```
public DatasetStorageFormat getFormat()
```
    Gets the data set format used to store data
    
    Returns:
    
    the format to use when writing
  - getBlockSize
```
public int getBlockSize()
```
    Gets the block size, in rows, used for encoding data.
    
    Returns:
    
    the size of encoded data blocks
  - setBlockSize
```
public void setBlockSize(int blockSize)
```
    Sets the block size, in rows, used for encoding data. By default, this is 64 rows. This setting is of most importance for DatasetStorageFormat.COLUMNAR.
    Using larger values may increase efficiency, but at a cost of using more memory.
    
    Parameters:
    
    blockSize - the size of encoded data blocks
  - computeFormat
```
protected DataFormat computeFormat(CompositionContext ctx)
```
    Description copied from class: AbstractWriter
    
    Determines the data format for the target. The returned format is used during composition to construct a WriteSink operator. If an implementation supports schema discovery, it must be performed in this method.
    
    Specified by:
    
    computeFormat in class AbstractWriter
    
    Parameters:
    
    ctx - the composition context for the current invocation of AbstractWriter.compose(CompositionContext)
    
    Returns:
    
    the target format to use
  - discoverMetadata
```
public DatasetMetadata discoverMetadata(FileClient client)
```
    Gets the metadata for the currently configured data target.
    
    Parameters:
    
    client - the file client
    
    Returns:
    
    the metadata of the target

Class WriteStagingDataset

Field Summary

Fields inherited from class com.pervasive.datarush.operators.io.AbstractWriter

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.io.AbstractWriter

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator

Constructor Detail

WriteStagingDataset

WriteStagingDataset

WriteStagingDataset

WriteStagingDataset

WriteStagingDataset

Method Detail

setFormat

getFormat

getBlockSize

setBlockSize

computeFormat

discoverMetadata