Class ReadFixedText

  • All Implemented Interfaces:
    LogicalOperator, RecordSourceOperator, SourceOperator<RecordPort>

    public class ReadFixedText
    extends AbstractTextReader
    Reads a text file of fixed-width records as record tokens. Records are identified by the presence of a non-empty, user-defined record separator sequence between each individual records. Output records contain the same fields as the input file. The parser can also filter and/or reorder the fields of the output, as requested.

    The reader requires a FixedWidthTextRecord object to provide field position as well as parsing and type information for fields. The schema, in conjunction with any specified field filter, defines the output type of the parser. These can be manually constructed via the API provided, although this metadata is often persisted externally. StructuredSchemaReader provides support for reading in Pervasive DataIntegrator structured schema descriptors (.schema files) for use with readers.

    Normally, the output of the parsing includes all records in the file, both those with and without parsing errors. Fields which can not be parsed are null valued in the resulting record. If desired, the reader can be configured to filter failed records from the output.

    Since record boundaries occur at known positions, fixed text files can be parsed in parallel.

    • Constructor Detail

      • ReadFixedText

        public ReadFixedText​(String pattern)
        Reads all paths matching the specified pattern as fixed text using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.

        The schema must be set before execution or an error will be raised.

        Parameters:
        pattern - a path-matching pattern
        See Also:
        setSchema(FixedWidthTextRecord), FileClient#matchPaths(String)
      • ReadFixedText

        public ReadFixedText​(Path path)
        Reads the file specified by the path as fixed text using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.

        The schema must be set before execution or an error will be raised.

        Parameters:
        path - the path to read
        See Also:
        setSchema(FixedWidthTextRecord)
      • ReadFixedText

        public ReadFixedText​(ByteSource source)
        Reads the specified data source using default options. The schema must be set before execution or an error will be raised.
        Parameters:
        source - the data source to read
        See Also:
        setSchema(FixedWidthTextRecord)
    • Method Detail

      • setRecordSeparator

        public void setRecordSeparator​(String separator)
        Set the string that represents the separator between records in the input file. The record separator cannot be embedded within records of the input or parse errors may occur.

        By default the record separator is set to the native filesystem separator for the architecture on which the application is running. This is normally divided into Unix/Linux style and Windows style end of record delimiters.

        Parameters:
        separator - record separator
      • getRecordSeparator

        public String getRecordSeparator()
        Get the record separator property.
        Returns:
        record separator property
      • setSchema

        public void setSchema​(FixedWidthTextRecord schema)
        Set the schema of the input data to read. The schema is a required property. The schema defines the field position and type of each field in the schema.
        Parameters:
        schema - required input schema
      • getSchema

        public FixedWidthTextRecord getSchema()
        Get the input schema property.
        Returns:
        input schema
      • getLineComment

        public String getLineComment()
        Get the value that indicates a line of data is a comment.
        Returns:
        text representing a comment indicator
      • setLineComment

        public void setLineComment​(String lineComment)
        Set the text that represents that a line of input data is a comment. If this text is found at the beginning of a line (row) of data, then the whole row is skipped and will not appear in the output.

        By default this option has a null value indicating that no comment lines are contained within the data.

        Parameters:
        lineComment - text representing a comment