All Implemented Interfaces:
LogicalOperator, RecordSourceOperator, SourceOperator<RecordPort>

public class ReadAvro extends AbstractReader
Reads data previously written using Apache Avro format. Avro data can be read in a parallel fashion.

As Avro serializes the schema with the data, it is not necessary to specify a schema when reading it. DataRush will automatically determine the equivalent data types from the Avro schema. The result will be the output type of the reader. However, as Avro and DataRush support different data types, not all data in Avro format can be read; if attempting to read data which cannot be represented in DataRush, an error will be raised.

Primitive Avro types are mapped to DataRush as indicated in the table below.

Avro TypeDataRush Type
BOOLEANBOOLEAN
BYTESBINARY
DOUBLEDOUBLE
FIXEDBINARY
FLOATFLOAT
LONGLONG
INTINT
STRINGSTRING
Complex Avro datatypes are mapped as described:
  • RECORD data in Avro will, in general, be mapped to a DataRush record type as long as each field can be mapped to a scalar type. Nested records are not currently allowed except for the Avro RECORD representations of DataRush DATE, TIME, and TIMESTAMP types as described in the WriteAvro operator.
  • ENUM data in Avro will be mapped to the DataRush string type, setting the domain to the enumerated list of symbols.
  • UNION data in Avro can be mapped only if it a union of NULL and exactly one other type which can be mapped to a scalar type.
  • ARRAY and MAP data in Avro is not currently supported.
See Also:
  • Constructor Details

    • ReadAvro

      public ReadAvro()
      Reads an empty source with default settings. The source must be set before execution or an error will be raised.
      See Also:
    • ReadAvro

      public ReadAvro(String pattern)
      Reads all paths matching the specified pattern using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not recursive.
      Parameters:
      pattern - a path-matching pattern
      See Also:
    • ReadAvro

      public ReadAvro(Path path)
      Reads the file specified by the path. If the path refers to a a directory, all files in the directory are read; this read is not recursive into sub-directories.
      Parameters:
      path - the path to read
    • ReadAvro

      public ReadAvro(ByteSource source)
      Reads the specified data source using default options.
      Parameters:
      source - the data source to read
  • Method Details

    • computeFormat

      protected DataFormat computeFormat(CompositionContext ctx)
      Description copied from class: AbstractReader
      Determines the data format for the source. The returned format is used during composition to construct a ReadSource operator. If an implementation supports schema discovery, it must be performed in this method.
      Specified by:
      computeFormat in class AbstractReader
      Parameters:
      ctx - the composition context for the current invocation of AbstractReader.compose(CompositionContext)
      Returns:
      the source format to use
    • discoverMetadata

      public AvroMetadata discoverMetadata(FileClient client)
      Gets the metadata for the currently configured data source.
      Parameters:
      client - the file client
      Returns:
      the metadata of the source