Module datarush.library
Class AbstractTextReader
java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.CompositeOperator
com.pervasive.datarush.operators.io.AbstractReader
com.pervasive.datarush.operators.io.textfile.AbstractTextReader
- All Implemented Interfaces:
LogicalOperator,RecordSourceOperator,SourceOperator<RecordPort>
- Direct Known Subclasses:
ReadARFF,ReadDelimitedText,ReadFixedText,ReadJSON,ReadLog
A generic reader of text data representing a stream of records.
The reader encompasses the
basic attributes any such reader should have beyond a
standard byte-oriented reader, namely information on
how to decode the bytes into characters.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final CharsetEncodingContainer for character encoding related attributesFields inherited from class com.pervasive.datarush.operators.io.AbstractReader
options, output -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedReads an empty source with default settings.protectedAbstractTextReader(Path path) Reads the file specified by the path using default options.protectedAbstractTextReader(ByteSource source) Reads the specified data source using default options.protectedAbstractTextReader(String pattern) Reads all paths matching the specified pattern using default options. -
Method Summary
Modifier and TypeMethodDescriptionGets the character set used by the data source.Gets the name of the character set used by the data source.intGets the size of the buffer, in bytes, used to decode character data.Get the character set encoding properties.Get the configured encoding error action.Get the text used by the replacement error action.voidsetCharset(Charset charset) Sets the character set used by the data source.voidsetCharsetName(String charsetName) Sets the character set used by the data source.voidsetDecodeBuffer(int size) Sets the size of the buffer, in bytes, used to decode character data.voidsetEncoding(CharsetEncoding settings) Set the properties that control character set encoding.voidsetErrorAction(CodingErrorAction errorAction) Set the encoding error action.voidsetReplacement(String replacement) Sets the error policy to be replacement with the specified string.Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader
compose, computeFormat, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadataMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Field Details
-
encodingProps
Container for character encoding related attributes
-
-
Constructor Details
-
AbstractTextReader
protected AbstractTextReader()Reads an empty source with default settings. The source must be set before execution or an error will be raised.- See Also:
-
AbstractTextReader
Reads all paths matching the specified pattern using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.- Parameters:
pattern- a path-matching pattern- See Also:
-
FileClient#matchPaths(String)
-
AbstractTextReader
Reads the file specified by the path using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.- Parameters:
path- the path to read
-
AbstractTextReader
Reads the specified data source using default options.- Parameters:
source- the data source to read
-
-
Method Details
-
getDecodeBuffer
public int getDecodeBuffer()Gets the size of the buffer, in bytes, used to decode character data.- Returns:
- the decoding buffer size
-
setDecodeBuffer
public void setDecodeBuffer(int size) Sets the size of the buffer, in bytes, used to decode character data. By default, this will be automatically derived using the character set and read buffer size.- Parameters:
size- the decoding buffer size to use
-
getEncoding
Get the character set encoding properties.- Returns:
- properties used for character set encoding
-
setEncoding
Set the properties that control character set encoding.- Parameters:
settings- character set encoding properties
-
getCharset
Gets the character set used by the data source.- Returns:
- the character set of the source
-
setCharset
Sets the character set used by the data source. By default ISO-8859-1 is used.- Parameters:
charset- the character set to use
-
getCharsetName
Gets the name of the character set used by the data source.- Returns:
- the name of character set of the source
-
setCharsetName
Sets the character set used by the data source.- Parameters:
charsetName- name of the character set- Throws:
InvalidPropertyValueException- if the named character set is not supported.
-
getErrorAction
Get the configured encoding error action.- Returns:
- encoding error action
-
setErrorAction
Set the encoding error action. The error action determines how to handle errors encoding the input data into the configured character set. The default action is to replace the faulty data with a replacement character.- Parameters:
errorAction- encoding error action
-
getReplacement
Get the text used by the replacement error action. This value is only used if the error action is to replace.- Returns:
- replacement text used for encoding errors
-
setReplacement
Sets the error policy to be replacement with the specified string. By default, "?" is used.- Parameters:
replacement- replacement value to use for encoding errors
-