Class DelimitedTextFormat

  • All Implemented Interfaces:
    DataFormat

    public class DelimitedTextFormat
    extends Object
    implements DataFormat
    Describes the format of a delimited text file. Normally, it is not necessary construct these directly. Instead, use ReadDelimitedText and WriteDelimitedText to access data stored as delimited text.
    • Constructor Detail

      • DelimitedTextFormat

        public DelimitedTextFormat​(RecordTextSchema<?> schema,
                                   FieldDelimiterSettings delimiters,
                                   CharsetEncoding encoding)
        Create a data format for accessing delimited text data. The text is assumed to have no header and to use '#' as the line comment marker.
        Parameters:
        schema - the schema to use for records. This provides fields names as well as formatting information for field values.
        delimiters - a description of the delimiters used in the text
        encoding - character set definition for data encoding in the text
      • DelimitedTextFormat

        public DelimitedTextFormat​(RecordTextSchema<?> schema,
                                   FieldDelimiterSettings delimiters,
                                   CharsetEncoding encoding,
                                   FileMetadata metadata)
        Create a data format for accessing delimited text data. The text is assumed to have no header and to use '#' as the line comment marker.
        Parameters:
        schema - the schema to use for records. This provides fields names as well as formatting information for field values.
        delimiters - a description of the delimiters used in the text
        encoding - character set definition for data encoding in the text
        metadata - the metadata associated with the data
      • DelimitedTextFormat

        public DelimitedTextFormat​(RecordTextSchema<?> schema,
                                   FieldDelimiterSettings delimiters,
                                   CharsetEncoding encoding,
                                   FileMetadata metadata,
                                   boolean hasHeader,
                                   String lineComment,
                                   int skipCount)
        Create a data format for accessing delimited text data.
        Parameters:
        schema - the schema to use for records. This provides fields names as well as formatting information for field values.
        delimiters - a description of the delimiters used in the text
        encoding - character set definition for data encoding
        metadata - the metadata associated with the data
        hasHeader - indicates whether the first record is a header.
        lineComment - characters used to indicate line comments in the text
        skipCount -
    • Method Detail

      • isSplittable

        public boolean isSplittable()
        Indicates if the format supports parsing of subsections of a file.

        A format should only return true if it can, at least in some situations, support this sort of parsing. If a format requires reading the entire file, it must return false.

        If a format is not splittable, a file in the format cannot be parsed in parallel; however, individual files can still be parsed independently in parallel, as when reading the contents of a directory or using a file globbing pattern.

        Generally, delimited text data is splittable. However, if any fields contain the record separator in their delimited value, it may not be.

        Specified by:
        isSplittable in interface DataFormat
        Returns:
        true if the format supports parsing only a portion of the file, false otherwise
      • getType

        public RecordTokenType getType()
        Description copied from interface: DataFormat
        Gets the record type associated with the format. Records produced by the associated parser or consumed by the associated formatter will be of this type.

        For many formats, this may be derived from a schema object describing the format layout.

        Specified by:
        getType in interface DataFormat
        Returns:
        the format's record type
      • getMetadata

        public FileMetadata getMetadata()
        Description copied from interface: DataFormat
        Gets the metadata associated with the format. Records produces by the associated parser or consumed by the associated formatter will use this metadata.
        Specified by:
        getMetadata in interface DataFormat
        Returns:
        the format's metadata
      • readMetadata

        public FileMetadata readMetadata​(FileClient fileClient,
                                         ByteSource source)
        Description copied from interface: DataFormat
        Reads the metadata associated with the format.
        Specified by:
        readMetadata in interface DataFormat
        Parameters:
        fileClient - client used to read file
        source - location of the files
      • writeMetadata

        public void writeMetadata​(FileMetadata metadata,
                                  FileClient fileClient,
                                  ByteSink target)
        Description copied from interface: DataFormat
        Writes the provided metadata associated with the format.
        Specified by:
        writeMetadata in interface DataFormat
        Parameters:
        metadata - the metadata to write
        fileClient - client used to write file
      • createParser

        public DataFormat.DataParser createParser​(ParsingOptions options)
        Description copied from interface: DataFormat
        Create a new parser for the format using the specified parsing options.
        Specified by:
        createParser in interface DataFormat
        Parameters:
        options - parsing options to use
        Returns:
        a new parser for reading external data
      • createWriter

        public DataFormat.DataFormatter createWriter​(FormattingOptions options)
        Description copied from interface: DataFormat
        Create a new writer for the format using the specified formatting options.
        Specified by:
        createWriter in interface DataFormat
        Parameters:
        options - formatting options to use
        Returns:
        a new formatter for writing external data