- java.lang.Object
-
- com.pervasive.datarush.operators.io.ParsingOptions
-
public class ParsingOptions extends Object
A collection of parameters for configuring parsing. This includes, but is not necessarily limited to:- Controlling which fields are to be parsed; omitting unwanted fields can make parsing much more efficient.
- Controlling how parsing errors are handled.
- Tuning the sizes of internal buffers.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_READ_BUFFER
The default size, in bytes, for read operations
-
Constructor Summary
Constructors Constructor Description ParsingOptions()
Creates a collection with default settings: All fields from the source will be read. Malformed records will be read; unparsable fields will be null-valued. Buffers will use default sizings.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getDecodeBuffer()
Gets the size of the buffer, in bytes, used to decode character data.ParseErrorAction
getExtraFieldAction()
Gets how fields found when parsing the record, but not declared in the schema are handled.ParseErrorAction
getFieldErrorAction()
Gets how fields which cannot be parsed are handled.int
getFieldLengthThreshold()
Gets the maximum length allowed for a field value before it is considered an error.ParseErrorAction
getMissingFieldAction()
Gets how fields declared in the schema, but not found when parsing the record are handled.int
getReadBuffer()
Gets the size of the I/O buffer, in bytes, to use for reads.int
getRecordWarningThreshold()
Gets the maximum number of records allowed to have parse warnings.List<String>
getSelectedFields()
Gets the list of record fields to parse.void
set(ParsingOptions options)
Copies the settings from the specified source.void
setDecodeBuffer(int size)
Sets the size of the buffer, in bytes, used to decode character data.void
setExtraFieldAction(ParseErrorAction action)
Sets how to handle fields found when parsing the record, but not declared in the schema.void
setFieldErrorAction(ParseErrorAction action)
Sets how to handle fields which cannot be parsed.void
setFieldLengthThreshold(int limit)
Configures the maximum length allowed for a field value before it is considered an error.void
setMissingFieldAction(ParseErrorAction action)
Sets how to handle fields declared in the schema, but not found when parsing the record.void
setParseErrorAction(ParseErrorAction action)
Sets how to handle all parsing errors.void
setReadBuffer(int size)
Sets the size of the I/O buffer, in bytes, to use for reads.void
setRecordWarningThreshold(int limit)
Configures the maximum number of records which can have parse warnings before failing.void
setSelectedFields(String... fields)
Sets the list of record fields to parse.void
setSelectedFields(List<String> fields)
Sets the list of record fields to parse.
-
-
-
Field Detail
-
DEFAULT_READ_BUFFER
public static final int DEFAULT_READ_BUFFER
The default size, in bytes, for read operations- See Also:
- Constant Field Values
-
-
Method Detail
-
set
public void set(ParsingOptions options)
Copies the settings from the specified source. Afterwards, this properties object will have the same values for settings.- Parameters:
options
- the settings to copy
-
getSelectedFields
public List<String> getSelectedFields()
Gets the list of record fields to parse.- Returns:
- the fields which will be parsed.
-
setSelectedFields
public void setSelectedFields(List<String> fields)
Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.- Parameters:
fields
- the record fields to parse
-
setSelectedFields
public void setSelectedFields(String... fields)
Sets the list of record fields to parse. If only a subset of fields are desired, it can be more efficient to parse only those fields. Only fields in this list will be in the output records. An empty list indicates all fields should be parsed; this is the default setting.- Parameters:
fields
- the record fields to parse
-
getMissingFieldAction
public ParseErrorAction getMissingFieldAction()
Gets how fields declared in the schema, but not found when parsing the record are handled.- Returns:
- the action to take on missing fields
-
setMissingFieldAction
public void setMissingFieldAction(ParseErrorAction action)
Sets how to handle fields declared in the schema, but not found when parsing the record. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting isParseErrorAction.WARN
.This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
- Parameters:
action
- the action to take on missing fields
-
getExtraFieldAction
public ParseErrorAction getExtraFieldAction()
Gets how fields found when parsing the record, but not declared in the schema are handled.- Returns:
- the action to take on extra fields
-
setExtraFieldAction
public void setExtraFieldAction(ParseErrorAction action)
Sets how to handle fields found when parsing the record, but not declared in the schema. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting isParseErrorAction.WARN
.This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
- Parameters:
action
- the action to take on extra fields
-
getFieldErrorAction
public ParseErrorAction getFieldErrorAction()
Gets how fields which cannot be parsed are handled.- Returns:
- the action to take on field errors
-
setFieldErrorAction
public void setFieldErrorAction(ParseErrorAction action)
Sets how to handle fields which cannot be parsed. If the configured action does not discard the record, the missing fields will be null-valued in the output. By default, this setting isParseErrorAction.WARN
.This setting is advisory in that parsers can behave differently than configured. However, a parser should only behave in a stricter fashion than configured.
- Parameters:
action
- the action to take on field errors
-
setParseErrorAction
public void setParseErrorAction(ParseErrorAction action)
Sets how to handle all parsing errors. This method is a convenience method for setting all individual classes of errors at once.- Parameters:
action
- the action to take on parse error- See Also:
setMissingFieldAction(ParseErrorAction)
,setExtraFieldAction(ParseErrorAction)
,setFieldErrorAction(ParseErrorAction)
-
getRecordWarningThreshold
public int getRecordWarningThreshold()
Gets the maximum number of records allowed to have parse warnings.- Returns:
- the limit on the number of records in error
-
setRecordWarningThreshold
public void setRecordWarningThreshold(int limit)
Configures the maximum number of records which can have parse warnings before failing. Only records which have an error whose configured action raises a warning count towards this limit. A record is only counted once towards this limit, regardless of the number of warnings.By default, this limit is
100
. Setting the limit to0
means there is no restriction on the number of warnings.This limit is applied per-split. Therefore, it is possible that a file in total may be allowed more warnings than the limit, depending on how it is split.
- Parameters:
limit
- the number of records with warnings allowed
-
getFieldLengthThreshold
public int getFieldLengthThreshold()
Gets the maximum length allowed for a field value before it is considered an error.- Returns:
- the maximum field value length allowed
-
setFieldLengthThreshold
public void setFieldLengthThreshold(int limit)
Configures the maximum length allowed for a field value before it is considered an error. Long fields can be a sign of a misconfigured reader. When this limit is reached, the parser will fail the current field and attempt to restart on the next field.By default, this limit is 1M.
This setting is considered advisory; formats with fixed length fields may ignore this setting.
- Parameters:
limit
- the maximum field value length allowed
-
getReadBuffer
public int getReadBuffer()
Gets the size of the I/O buffer, in bytes, to use for reads.- Returns:
- the size of the read buffer
-
setReadBuffer
public void setReadBuffer(int size)
Sets the size of the I/O buffer, in bytes, to use for reads. The default size is 64K.- Parameters:
size
- the size of the read buffer
-
getDecodeBuffer
public int getDecodeBuffer()
Gets the size of the buffer, in bytes, used to decode character data.- Returns:
- the decoding buffer size
-
setDecodeBuffer
public void setDecodeBuffer(int size)
Sets the size of the buffer, in bytes, used to decode character data. By default, this will be automatically derived using the character set and read buffer size.- Parameters:
size
- the decoding buffer size to use
-
-