- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractReader
-
- com.pervasive.datarush.operators.io.textfile.AbstractTextReader
-
- com.pervasive.datarush.operators.io.textfile.ReadARFF
-
- All Implemented Interfaces:
LogicalOperator
,RecordSourceOperator
,SourceOperator<RecordPort>
public class ReadARFF extends AbstractTextReader
Read files in the Attribute-Relation File Format (ARFF). Files in ARFF can be in either sparse or dense mode. This reader detects the mode and reads the data accordingly. ARFF files contain schema information. The schema is parsed and used to determine how to parse data lines.ARFF can be parsed in parallel under "optimistic" assumptions: namely, that parse splits do not occur in the middle of a delimited field value and somewhere before an escaped record separator. This is assumed by default, but can be disabled, with an accompanying reduction of scalability and performance.
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
encodingProps
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader
options, output
-
-
Constructor Summary
Constructors Constructor Description ReadARFF()
Reads an empty source with default settings.ReadARFF(Path path)
Reads the file specified by the path as ARFF data using default options.ReadARFF(ByteSource source)
Reads the specified data source using default options.ReadARFF(String pattern)
Reads all paths matching the specified pattern as ARFF data using default options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the source.ARFFAnalyzer.Analysis
discoverMetadata(FileClient ctx)
Gets the metadata for the currently configured data source.char
getFieldDelimiter()
Get the configured field delimiter property value.void
setFieldDelimiter(char fieldDelimiter)
Set the field delimiter to use when reading the file contents.-
Methods inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
getCharset, getCharsetName, getDecodeBuffer, getEncoding, getErrorAction, getReplacement, setCharset, setCharsetName, setDecodeBuffer, setEncoding, setErrorAction, setReplacement
-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader
compose, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadata
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
ReadARFF
public ReadARFF()
Reads an empty source with default settings. The source must be set before execution or an error will be raised.- See Also:
AbstractReader.setSource(ByteSource)
-
ReadARFF
public ReadARFF(String pattern)
Reads all paths matching the specified pattern as ARFF data using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.- Parameters:
pattern
- a path-matching pattern- See Also:
FileClient.matchPaths(String)
-
ReadARFF
public ReadARFF(Path path)
Reads the file specified by the path as ARFF data using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.- Parameters:
path
- the path to read
-
ReadARFF
public ReadARFF(ByteSource source)
Reads the specified data source using default options.- Parameters:
source
- the data source to read
-
-
Method Detail
-
setFieldDelimiter
public void setFieldDelimiter(char fieldDelimiter)
Set the field delimiter to use when reading the file contents. A single quote is used by default. The only supported values are a single quote and a double quote.- Parameters:
fieldDelimiter
- character value to use the field delimiter
-
getFieldDelimiter
public char getFieldDelimiter()
Get the configured field delimiter property value.- Returns:
- configured field delimiter
-
discoverMetadata
public ARFFAnalyzer.Analysis discoverMetadata(FileClient ctx)
Gets the metadata for the currently configured data source.- Parameters:
ctx
- the authorization context to use for accessing the file- Returns:
- the metadata of the source
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractReader
Determines the data format for the source. The returned format is used during composition to construct aReadSource
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractReader
- Parameters:
ctx
- the composition context for the current invocation ofAbstractReader.compose(CompositionContext)
- Returns:
- the source format to use
-
-