Module datarush.library
Class ReadStagingDataset
- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractReader
-
- com.pervasive.datarush.operators.io.staging.ReadStagingDataset
-
- All Implemented Interfaces:
LogicalOperator
,RecordSourceOperator
,SourceOperator<RecordPort>
public class ReadStagingDataset extends AbstractReader
Reads a sequence of records previously staged to disk. Staged data sets are useful as they are more efficient than text files, being stored in a compact binary format. If a set of data must be read multiple times, significant savings can be achieved by converting it into a data set first.The staged data format is not splitable. To obtain parallelism, perform parallel writes to create a set of files; reads of the multiple files will be fully parallel.
- See Also:
WriteStagingDataset
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader
options, output
-
-
Constructor Summary
Constructors Constructor Description ReadStagingDataset()
Reads an empty source with default settings.ReadStagingDataset(Path path)
Reads the file specified by the path as staged data using default options.ReadStagingDataset(ByteSource source)
Reads the specified data source using default options.ReadStagingDataset(String pattern)
Reads all paths matching the specified pattern as staged data using default options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the source.DatasetMetadata
discoverMetadata(FileClient client)
Gets the metadata for the currently configured data source.-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader
compose, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadata
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
ReadStagingDataset
public ReadStagingDataset()
Reads an empty source with default settings. The source must be set before execution or an error will be raised.- See Also:
AbstractReader.setSource(ByteSource)
-
ReadStagingDataset
public ReadStagingDataset(String pattern)
Reads all paths matching the specified pattern as staged data using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.- Parameters:
pattern
- a path-matching pattern- See Also:
FileClient.matchPaths(String)
-
ReadStagingDataset
public ReadStagingDataset(Path path)
Reads the file specified by the path as staged data using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.- Parameters:
path
- the path to read
-
ReadStagingDataset
public ReadStagingDataset(ByteSource source)
Reads the specified data source using default options.- Parameters:
source
- the data source to read
-
-
Method Detail
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractReader
Determines the data format for the source. The returned format is used during composition to construct aReadSource
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractReader
- Parameters:
ctx
- the composition context for the current invocation ofAbstractReader.compose(CompositionContext)
- Returns:
- the source format to use
-
discoverMetadata
public DatasetMetadata discoverMetadata(FileClient client)
Gets the metadata for the currently configured data source.- Parameters:
client
- the file client- Returns:
- the metadata of the source
-
-