Class ReadAvro

  • All Implemented Interfaces:
    LogicalOperator, RecordSourceOperator, SourceOperator<RecordPort>

    public class ReadAvro
    extends AbstractReader
    Reads data previously written using Apache Avro format. Avro data can be read in a parallel fashion.

    As Avro serializes the schema with the data, it is not necessary to specify a schema when reading it. DataRush will automatically determine the equivalent data types from the Avro schema. The result will be the output type of the reader. However, as Avro and DataRush support different data types, not all data in Avro format can be read; if attempting to read data which cannot be represented in DataRush, an error will be raised.

    Primitive Avro types are mapped to DataRush as indicated in the table below.

    Avro TypeDataRush Type
    BOOLEANBOOLEAN
    BYTESBINARY
    DOUBLEDOUBLE
    FIXEDBINARY
    FLOATFLOAT
    LONGLONG
    INTINT
    STRINGSTRING
    Complex Avro datatypes are mapped as described:
    • RECORD data in Avro will, in general, be mapped to a DataRush record type as long as each field can be mapped to a scalar type. Nested records are not currently allowed except for the Avro RECORD representations of DataRush DATE, TIME, and TIMESTAMP types as described in the WriteAvro operator.
    • ENUM data in Avro will be mapped to the DataRush string type, setting the domain to the enumerated list of symbols.
    • UNION data in Avro can be mapped only if it a union of NULL and exactly one other type which can be mapped to a scalar type.
    • ARRAY and MAP data in Avro is not currently supported.
    See Also:
    WriteAvro
    • Constructor Detail

      • ReadAvro

        public ReadAvro()
        Reads an empty source with default settings. The source must be set before execution or an error will be raised.
        See Also:
        AbstractReader.setSource(ByteSource)
      • ReadAvro

        public ReadAvro​(String pattern)
        Reads all paths matching the specified pattern using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not recursive.
        Parameters:
        pattern - a path-matching pattern
        See Also:
        FileClient.matchPaths(String)
      • ReadAvro

        public ReadAvro​(Path path)
        Reads the file specified by the path. If the path refers to a a directory, all files in the directory are read; this read is not recursive into sub-directories.
        Parameters:
        path - the path to read
      • ReadAvro

        public ReadAvro​(ByteSource source)
        Reads the specified data source using default options.
        Parameters:
        source - the data source to read
    • Method Detail

      • discoverMetadata

        public AvroMetadata discoverMetadata​(FileClient client)
        Gets the metadata for the currently configured data source.
        Parameters:
        client - the file client
        Returns:
        the metadata of the source