- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.AbstractReader
-
- com.pervasive.datarush.operators.io.avro.ReadAvro
-
- All Implemented Interfaces:
LogicalOperator
,RecordSourceOperator
,SourceOperator<RecordPort>
public class ReadAvro extends AbstractReader
Reads data previously written using Apache Avro format. Avro data can be read in a parallel fashion.As Avro serializes the schema with the data, it is not necessary to specify a schema when reading it. DataRush will automatically determine the equivalent data types from the Avro schema. The result will be the output type of the reader. However, as Avro and DataRush support different data types, not all data in Avro format can be read; if attempting to read data which cannot be represented in DataRush, an error will be raised.
Primitive Avro types are mapped to DataRush as indicated in the table below.
Avro Type DataRush Type BOOLEAN BOOLEAN BYTES BINARY DOUBLE DOUBLE FIXED BINARY FLOAT FLOAT LONG LONG INT INT STRING STRING - RECORD data in Avro will, in general, be mapped to a DataRush record type as long as each field
can be mapped to a
scalar type
. Nested records are not currently allowed except for the Avro RECORD representations of DataRush DATE, TIME, and TIMESTAMP types as described in theWriteAvro
operator. - ENUM data in Avro will be mapped to the DataRush string type, setting the
domain
to the enumerated list of symbols. - UNION data in Avro can be mapped only if it a union of NULL and exactly one other type which can be mapped to a scalar type.
- ARRAY and MAP data in Avro is not currently supported.
- See Also:
WriteAvro
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader
options, output
-
-
Constructor Summary
Constructors Constructor Description ReadAvro()
Reads an empty source with default settings.ReadAvro(Path path)
Reads the file specified by the path.ReadAvro(ByteSource source)
Reads the specified data source using default options.ReadAvro(String pattern)
Reads all paths matching the specified pattern using default options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected DataFormat
computeFormat(CompositionContext ctx)
Determines the data format for the source.AvroMetadata
discoverMetadata(FileClient client)
Gets the metadata for the currently configured data source.-
Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader
compose, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadata
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
ReadAvro
public ReadAvro()
Reads an empty source with default settings. The source must be set before execution or an error will be raised.- See Also:
AbstractReader.setSource(ByteSource)
-
ReadAvro
public ReadAvro(String pattern)
Reads all paths matching the specified pattern using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not recursive.- Parameters:
pattern
- a path-matching pattern- See Also:
FileClient.matchPaths(String)
-
ReadAvro
public ReadAvro(Path path)
Reads the file specified by the path. If the path refers to a a directory, all files in the directory are read; this read is not recursive into sub-directories.- Parameters:
path
- the path to read
-
ReadAvro
public ReadAvro(ByteSource source)
Reads the specified data source using default options.- Parameters:
source
- the data source to read
-
-
Method Detail
-
computeFormat
protected DataFormat computeFormat(CompositionContext ctx)
Description copied from class:AbstractReader
Determines the data format for the source. The returned format is used during composition to construct aReadSource
operator. If an implementation supports schema discovery, it must be performed in this method.- Specified by:
computeFormat
in classAbstractReader
- Parameters:
ctx
- the composition context for the current invocation ofAbstractReader.compose(CompositionContext)
- Returns:
- the source format to use
-
discoverMetadata
public AvroMetadata discoverMetadata(FileClient client)
Gets the metadata for the currently configured data source.- Parameters:
client
- the file client- Returns:
- the metadata of the source
-
-