Class ParsingOptions

java.lang.Object
com.pervasive.datarush.operators.io.ParsingOptions

public class ParsingOptions extends Object
A collection of parameters for configuring parsing. This includes, but is not necessarily limited to:
  • Controlling which fields are to be parsed; omitting unwanted fields can make parsing much more efficient.
  • Controlling how parsing errors are handled.
  • Tuning the sizes of internal buffers.
  • Field Details

    • DEFAULT_READ_BUFFER

      public static final int DEFAULT_READ_BUFFER
      The default size, in bytes, for read operations
      See Also:
  • Constructor Details

    • ParsingOptions

      public ParsingOptions()
      Creates a collection with default settings:
      • All fields from the source will be read.
      • Malformed records will be read; unparsable fields will be null-valued.
      • Buffers will use default sizings.
  • Method Details

    • set

      public void set(ParsingOptions options)
      Copies the settings from the specified source. Afterwards, this properties object will have the same values for settings.
      Parameters:
      options - the settings to copy
    • getSelectedFields

      public List<String> getSelectedFields()
      Gets the list of record fields to parse.
      Returns:
      the fields which will be parsed.
    • setSelectedFields

      public void setSelectedFields(List<String> fields)
      Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
      Parameters:
      fields - the record fields to parse
    • setSelectedFields

      public void setSelectedFields(String... fields)
      Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
      Parameters:
      fields - the record fields to parse
    • getMissingFieldAction

      public ParseErrorAction getMissingFieldAction()
      Gets how fields declared in the schema, but not found when parsing the record are handled.
      Returns:
      the action to take on missing fields
    • setMissingFieldAction

      public void setMissingFieldAction(ParseErrorAction action)
      Sets how to handle fields declared in the schema, but not found when parsing the record. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

      This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

      Parameters:
      action - the action to take on missing fields
    • getExtraFieldAction

      public ParseErrorAction getExtraFieldAction()
      Gets how fields found when parsing the record, but not declared in the schema are handled.
      Returns:
      the action to take on extra fields
    • setExtraFieldAction

      public void setExtraFieldAction(ParseErrorAction action)
      Sets how to handle fields found when parsing the record, but not declared in the schema. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

      This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

      Parameters:
      action - the action to take on extra fields
    • getFieldErrorAction

      public ParseErrorAction getFieldErrorAction()
      Gets how fields which cannot be parsed are handled.
      Returns:
      the action to take on field errors
    • setFieldErrorAction

      public void setFieldErrorAction(ParseErrorAction action)
      Sets how to handle fields which cannot be parsed. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.

      This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.

      Parameters:
      action - the action to take on field errors
    • setParseErrorAction

      public void setParseErrorAction(ParseErrorAction action)
      Sets how to handle all parsing errors. This method is a convenience method for setting all individual classes of errors at once.
      Parameters:
      action - the action to take on parse error
      See Also:
    • getRecordWarningThreshold

      public int getRecordWarningThreshold()
      Gets the maximum number of records allowed to have parse warnings.
      Returns:
      the limit on the number of records in error
    • setRecordWarningThreshold

      public void setRecordWarningThreshold(int limit)
      Configures the maximum number of records which can have parse warnings before failing. Only records which have an error whose configured action raises a warning count towards this limit. A record is only counted once towards this limit, regardless of the number of warnings.

      By default, this limit is 100. Setting the limit to 0 means there is no restriction on the number of warnings.

      This limit is applied per-split. Therefore, it is possible that a file in total may be allowed more warnings than the limit, depending on how it is split.

      Parameters:
      limit - the number of records with warnings allowed
    • getFieldLengthThreshold

      public int getFieldLengthThreshold()
      Gets the maximum length allowed for a field value before it is considered an error.
      Returns:
      the maximum field value length allowed
    • setFieldLengthThreshold

      public void setFieldLengthThreshold(int limit)
      Configures the maximum length allowed for a field value before it is considered an error. Long fields can be a sign of a misconfigured reader. When this limit is reached, the parser will fail the current field and attempt to restart on the next field.

      By default, this limit is 1M.

      This setting is considered advisory; formats with fixed length fields may ignore this setting.

      Parameters:
      limit - the maximum field value length allowed
    • getReadBuffer

      public int getReadBuffer()
      Gets the size of the I/O buffer, in bytes, to use for reads.
      Returns:
      the size of the read buffer
    • setReadBuffer

      public void setReadBuffer(int size)
      Sets the size of the I/O buffer, in bytes, to use for reads. The default size is 64K.
      Parameters:
      size - the size of the read buffer
    • getDecodeBuffer

      public int getDecodeBuffer()
      Gets the size of the buffer, in bytes, used to decode character data.
      Returns:
      the decoding buffer size
    • setDecodeBuffer

      public void setDecodeBuffer(int size)
      Sets the size of the buffer, in bytes, used to decode character data. By default, this will be automatically derived using the character set and read buffer size.
      Parameters:
      size - the decoding buffer size to use