java.lang.Object

com.pervasive.datarush.operators.io.ParsingOptions

public class ParsingOptions extends Object

A collection of parameters for configuring parsing. This includes, but is not necessarily limited to:

Controlling which fields are to be parsed; omitting unwanted fields can make parsing much more efficient.
Controlling how parsing errors are handled.
Tuning the sizes of internal buffers.

Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_READ_BUFFER

The default size, in bytes, for read operations
Constructor Summary

Constructors

Constructor

Description

ParsingOptions()

Creates a collection with default settings: All fields from the source will be read. Malformed records will be read; unparsable fields will be null-valued. Buffers will use default sizings.
Method Summary

Modifier and Type

Method

Description

int

getDecodeBuffer()

Gets the size of the buffer, in bytes, used to decode character data.

ParseErrorAction

getExtraFieldAction()

Gets how fields found when parsing the record, but not declared in the schema are handled.

ParseErrorAction

getFieldErrorAction()

Gets how fields which cannot be parsed are handled.

int

getFieldLengthThreshold()

Gets the maximum length allowed for a field value before it is considered an error.

ParseErrorAction

getMissingFieldAction()

Gets how fields declared in the schema, but not found when parsing the record are handled.

int

getReadBuffer()

Gets the size of the I/O buffer, in bytes, to use for reads.

int

getRecordWarningThreshold()

Gets the maximum number of records allowed to have parse warnings.

List<String>

getSelectedFields()

Gets the list of record fields to parse.

void

set(ParsingOptions options)

Copies the settings from the specified source.

void

setDecodeBuffer(int size)

Sets the size of the buffer, in bytes, used to decode character data.

void

setExtraFieldAction(ParseErrorAction action)

Sets how to handle fields found when parsing the record, but not declared in the schema.

void

setFieldErrorAction(ParseErrorAction action)

Sets how to handle fields which cannot be parsed.

void

setFieldLengthThreshold(int limit)

Configures the maximum length allowed for a field value before it is considered an error.

void

setMissingFieldAction(ParseErrorAction action)

Sets how to handle fields declared in the schema, but not found when parsing the record.

void

setParseErrorAction(ParseErrorAction action)

Sets how to handle all parsing errors.

void

setReadBuffer(int size)

Sets the size of the I/O buffer, in bytes, to use for reads.

void

setRecordWarningThreshold(int limit)

Configures the maximum number of records which can have parse warnings before failing.

void

setSelectedFields(String... fields)

Sets the list of record fields to parse.

void

setSelectedFields(List<String> fields)

Sets the list of record fields to parse.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_READ_BUFFER
  
  public static final int DEFAULT_READ_BUFFER
  
  The default size, in bytes, for read operations
  See Also:
  
  Constant Field Values
Constructor Details
- ParsingOptions
  
  public ParsingOptions()
  Creates a collection with default settings:
  
  All fields from the source will be read.
  
  Malformed records will be read; unparsable fields will be null-valued.
  
  Buffers will use default sizings.
Method Details
- set
  
  public void set(ParsingOptions options)
  
  Copies the settings from the specified source. Afterwards, this properties object will have the same values for settings.
  
  Parameters:
  
  options - the settings to copy
- getSelectedFields
  
  public List<String> getSelectedFields()
  
  Gets the list of record fields to parse.
  
  Returns:
  
  the fields which will be parsed.
- setSelectedFields
  
  public void setSelectedFields(List<String> fields)
  
  Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
  
  Parameters:
  
  fields - the record fields to parse
- setSelectedFields
  
  public void setSelectedFields(String... fields)
  
  Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.
  
  Parameters:
  
  fields - the record fields to parse
- getMissingFieldAction
  
  public ParseErrorAction getMissingFieldAction()
  
  Gets how fields declared in the schema, but not found when parsing the record are handled.
  
  Returns:
  
  the action to take on missing fields
- setMissingFieldAction
  
  public void setMissingFieldAction(ParseErrorAction action)
  
  Sets how to handle fields declared in the schema, but not found when parsing the record. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.
  This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
  
  Parameters:
  
  action - the action to take on missing fields
- getExtraFieldAction
  
  public ParseErrorAction getExtraFieldAction()
  
  Gets how fields found when parsing the record, but not declared in the schema are handled.
  
  Returns:
  
  the action to take on extra fields
- setExtraFieldAction
  
  public void setExtraFieldAction(ParseErrorAction action)
  
  Sets how to handle fields found when parsing the record, but not declared in the schema. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.
  This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
  
  Parameters:
  
  action - the action to take on extra fields
- getFieldErrorAction
  
  public ParseErrorAction getFieldErrorAction()
  
  Gets how fields which cannot be parsed are handled.
  
  Returns:
  
  the action to take on field errors
- setFieldErrorAction
  
  public void setFieldErrorAction(ParseErrorAction action)
  
  Sets how to handle fields which cannot be parsed. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting is ParseErrorAction.WARN.
  This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
  
  Parameters:
  
  action - the action to take on field errors
- setParseErrorAction
  
  public void setParseErrorAction(ParseErrorAction action)
  
  Sets how to handle all parsing errors. This method is a convenience method for setting all individual classes of errors at once.
  Parameters:
  
  action - the action to take on parse error
  
  See Also:
  
  setMissingFieldAction(ParseErrorAction)
  
  setExtraFieldAction(ParseErrorAction)
  
  setFieldErrorAction(ParseErrorAction)
- getRecordWarningThreshold
  
  public int getRecordWarningThreshold()
  
  Gets the maximum number of records allowed to have parse warnings.
  
  Returns:
  
  the limit on the number of records in error
- setRecordWarningThreshold
  
  public void setRecordWarningThreshold(int limit)
  
  Configures the maximum number of records which can have parse warnings before failing. Only records which have an error whose configured action raises a warning count towards this limit. A record is only counted once towards this limit, regardless of the number of warnings.
  By default, this limit is 100. Setting the limit to 0 means there is no restriction on the number of warnings.
  This limit is applied per-split. Therefore, it is possible that a file in total may be allowed more warnings than the limit, depending on how it is split.
  
  Parameters:
  
  limit - the number of records with warnings allowed
- getFieldLengthThreshold
  
  public int getFieldLengthThreshold()
  
  Gets the maximum length allowed for a field value before it is considered an error.
  
  Returns:
  
  the maximum field value length allowed
- setFieldLengthThreshold
  
  public void setFieldLengthThreshold(int limit)
  
  Configures the maximum length allowed for a field value before it is considered an error. Long fields can be a sign of a misconfigured reader. When this limit is reached, the parser will fail the current field and attempt to restart on the next field.
  By default, this limit is 1M.
  This setting is considered advisory; formats with fixed length fields may ignore this setting.
  
  Parameters:
  
  limit - the maximum field value length allowed
- getReadBuffer
  
  public int getReadBuffer()
  
  Gets the size of the I/O buffer, in bytes, to use for reads.
  
  Returns:
  
  the size of the read buffer
- setReadBuffer
  
  public void setReadBuffer(int size)
  
  Sets the size of the I/O buffer, in bytes, to use for reads. The default size is 64K.
  
  Parameters:
  
  size - the size of the read buffer
- getDecodeBuffer
  
  public int getDecodeBuffer()
  
  Gets the size of the buffer, in bytes, used to decode character data.
  
  Returns:
  
  the decoding buffer size
- setDecodeBuffer
  
  public void setDecodeBuffer(int size)
  
  Sets the size of the buffer, in bytes, used to decode character data. By default, this will be automatically derived using the character set and read buffer size.
  
  Parameters:
  
  size - the decoding buffer size to use

Class ParsingOptions

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

DEFAULT_READ_BUFFER

Constructor Details

ParsingOptions

Method Details

set

getSelectedFields

setSelectedFields

setSelectedFields

getMissingFieldAction

setMissingFieldAction

getExtraFieldAction

setExtraFieldAction

getFieldErrorAction

setFieldErrorAction

setParseErrorAction

getRecordWarningThreshold

setRecordWarningThreshold

getFieldLengthThreshold

setFieldLengthThreshold

getReadBuffer

setReadBuffer

getDecodeBuffer

setDecodeBuffer