- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractReader
-
- com.pervasive.datarush.operators.io.textfile.AbstractTextReader
-
- com.pervasive.datarush.operators.io.textfile.ReadFixedText
-
- All Implemented Interfaces:
LogicalOperator
,RecordSourceOperator
,SourceOperator<RecordPort>
public class ReadFixedText extends AbstractTextReader
Reads a text file of fixed-width records as record tokens. Records are identified by the presence of a non-empty, user-defined record separator sequence between each individual records. Output records contain the same fields as the input file. The parser can also filter and/or reorder the fields of the output, as requested.The reader requires a
FixedWidthTextRecord
object to provide field position as well as parsing and type information for fields. The schema, in conjunction with any specified field filter, defines the output type of the parser. These can be manually constructed via the API provided, although this metadata is often persisted externally.StructuredSchemaReader
provides support for reading in Pervasive DataIntegrator structured schema descriptors (.schema files) for use with readers.Normally, the output of the parsing includes all records in the file, both those with and without parsing errors. Fields which can not be parsed are null valued in the resulting record. If desired, the reader can be configured to filter failed records from the output.
Since record boundaries occur at known positions, fixed text files can be parsed in parallel.
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
encodingProps
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader
options, output
-
-
Constructor Summary
Constructors Constructor Description ReadFixedText()
Reads an empty source with default settings.ReadFixedText(Path path)
Reads the file specified by the path as fixed text using default options.ReadFixedText(ByteSource source)
Reads the specified data source using default options.ReadFixedText(String pattern)
Reads all paths matching the specified pattern as fixed text using default options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ReadFixedText
clone()
protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the source.String
getLineComment()
Get the value that indicates a line of data is a comment.String
getRecordSeparator()
Get the record separator property.FixedWidthTextRecord
getSchema()
Get the input schema property.void
setLineComment(String lineComment)
Set the text that represents that a line of input data is a comment.void
setRecordSeparator(String separator)
Set the string that represents the separator between records in the input file.void
setSchema(FixedWidthTextRecord schema)
Set the schema of the input data to read.-
Methods inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
getCharset, getCharsetName, getDecodeBuffer, getEncoding, getErrorAction, getReplacement, setCharset, setCharsetName, setDecodeBuffer, setEncoding, setErrorAction, setReplacement
-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader
compose, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadata
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
ReadFixedText
public ReadFixedText()
Reads an empty source with default settings. The source and schema must be set before execution or an error will be raised.
-
ReadFixedText
public ReadFixedText(String pattern)
Reads all paths matching the specified pattern as fixed text using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.The schema must be set before execution or an error will be raised.
- Parameters:
pattern
- a path-matching pattern- See Also:
setSchema(FixedWidthTextRecord)
,FileClient#matchPaths(String)
-
ReadFixedText
public ReadFixedText(Path path)
Reads the file specified by the path as fixed text using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.The schema must be set before execution or an error will be raised.
- Parameters:
path
- the path to read- See Also:
setSchema(FixedWidthTextRecord)
-
ReadFixedText
public ReadFixedText(ByteSource source)
Reads the specified data source using default options. The schema must be set before execution or an error will be raised.- Parameters:
source
- the data source to read- See Also:
setSchema(FixedWidthTextRecord)
-
-
Method Detail
-
clone
public ReadFixedText clone()
-
setRecordSeparator
public void setRecordSeparator(String separator)
Set the string that represents the separator between records in the input file. The record separator cannot be embedded within records of the input or parse errors may occur.By default the record separator is set to the native filesystem separator for the architecture on which the application is running. This is normally divided into Unix/Linux style and Windows style end of record delimiters.
- Parameters:
separator
- record separator
-
getRecordSeparator
public String getRecordSeparator()
Get the record separator property.- Returns:
- record separator property
-
setSchema
public void setSchema(FixedWidthTextRecord schema)
Set the schema of the input data to read. The schema is a required property. The schema defines the field position and type of each field in the schema.- Parameters:
schema
- required input schema
-
getSchema
public FixedWidthTextRecord getSchema()
Get the input schema property.- Returns:
- input schema
-
getLineComment
public String getLineComment()
Get the value that indicates a line of data is a comment.- Returns:
- text representing a comment indicator
-
setLineComment
public void setLineComment(String lineComment)
Set the text that represents that a line of input data is a comment. If this text is found at the beginning of a line (row) of data, then the whole row is skipped and will not appear in the output.By default this option has a null value indicating that no comment lines are contained within the data.
- Parameters:
lineComment
- text representing a comment
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractReader
Determines the data format for the source. The returned format is used during composition to construct aReadSource
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractReader
- Parameters:
ctx
- the composition context for the current invocation ofAbstractReader.compose(CompositionContext)
- Returns:
- the source format to use
-
-