Class ParsingOptions


  • public class ParsingOptions
    extends Object
    A collection of parameters for configuring parsing. This includes, but is not necessarily limited to:
    • Controlling which fields are to be parsed; omitting unwanted fields can make parsing much more efficient.
    • Controlling how parsing errors are handled.
    • Tuning the sizes of internal buffers.
    • Field Detail

      • DEFAULT_READ_BUFFER

        public static final int DEFAULT_READ_BUFFER
        The default size, in bytes, for read operations
        See Also:
        Constant Field Values
    • Constructor Detail

      • ParsingOptions

        public ParsingOptions()
        Creates a collection with default settings:
        • All fields from the source will be read.
        • Malformed records will be read; unparsable fields will be null-valued.
        • Buffers will use default sizings.
    • Method Detail

      • set

        public void set​(ParsingOptions options)
        Copies the settings from the specified source. Afterwards, this properties object will have the same values for settings.
        Parameters:
        options - the settings to copy
      • getSelectedFields

        public List<String> getSelectedFields()
        Gets the list of record fields to parse.
        Returns:
        the fields which will be parsed.
      • setSelectedFields

        public void setSelectedFields​(List<String> fields)
        Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
        Parameters:
        fields - the record fields to parse
      • setSelectedFields

        public void setSelectedFields​(String... fields)
        Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
        Parameters:
        fields - the record fields to parse
      • getMissingFieldAction

        public ParseErrorAction getMissingFieldAction()
        Gets how fields declared in the schema, but not found when parsing the record are handled.
        Returns:
        the action to take on missing fields
      • setMissingFieldAction

        public void setMissingFieldAction​(ParseErrorAction action)
        Sets how to handle fields declared in the schema, but not found when parsing the record. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

        This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

        Parameters:
        action - the action to take on missing fields
      • getExtraFieldAction

        public ParseErrorAction getExtraFieldAction()
        Gets how fields found when parsing the record, but not declared in the schema are handled.
        Returns:
        the action to take on extra fields
      • setExtraFieldAction

        public void setExtraFieldAction​(ParseErrorAction action)
        Sets how to handle fields found when parsing the record, but not declared in the schema. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

        This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

        Parameters:
        action - the action to take on extra fields
      • getFieldErrorAction

        public ParseErrorAction getFieldErrorAction()
        Gets how fields which cannot be parsed are handled.
        Returns:
        the action to take on field errors
      • setFieldErrorAction

        public void setFieldErrorAction​(ParseErrorAction action)
        Sets how to handle fields which cannot be parsed. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

        This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

        Parameters:
        action - the action to take on field errors
      • getRecordWarningThreshold

        public int getRecordWarningThreshold()
        Gets the maximum number of records allowed to have parse warnings.
        Returns:
        the limit on the number of records in error
      • setRecordWarningThreshold

        public void setRecordWarningThreshold​(int limit)
        Configures the maximum number of records which can have parse warnings before failing. Only records which have an error whose configured action raises a warning count towards this limit. A record is only counted once towards this limit, regardless of the number of warnings.

        By default, this limit is 100. Setting the limit to 0 means there is no restriction on the number of warnings.

        This limit is applied per-split. Therefore, it is possible that a file in total may be allowed more warnings than the limit, depending on how it is split.

        Parameters:
        limit - the number of records with warnings allowed
      • getFieldLengthThreshold

        public int getFieldLengthThreshold()
        Gets the maximum length allowed for a field value before it is considered an error.
        Returns:
        the maximum field value length allowed
      • setFieldLengthThreshold

        public void setFieldLengthThreshold​(int limit)
        Configures the maximum length allowed for a field value before it is considered an error. Long fields can be a sign of a misconfigured reader. When this limit is reached, the parser will fail the current field and attempt to restart on the next field.

        By default, this limit is 1M.

        This setting is considered advisory; formats with fixed length fields may ignore this setting.

        Parameters:
        limit - the maximum field value length allowed
      • getReadBuffer

        public int getReadBuffer()
        Gets the size of the I/O buffer, in bytes, to use for reads.
        Returns:
        the size of the read buffer
      • setReadBuffer

        public void setReadBuffer​(int size)
        Sets the size of the I/O buffer, in bytes, to use for reads. The default size is 64K.
        Parameters:
        size - the size of the read buffer
      • getDecodeBuffer

        public int getDecodeBuffer()
        Gets the size of the buffer, in bytes, used to decode character data.
        Returns:
        the decoding buffer size
      • setDecodeBuffer

        public void setDecodeBuffer​(int size)
        Sets the size of the buffer, in bytes, used to decode character data. By default, this will be automatically derived using the character set and read buffer size.
        Parameters:
        size - the decoding buffer size to use