public class ReadJSON extends AbstractTextReader
In JSON it is expected that all field keys start and end with a delimiter. A "\"" (double quote) is typically used as the field delimiter. However, the user may enable the property allowSingleQuotes to avoid parsing errors when single quotes are used instead. This operator uses the Jackson JSON parsing library to parse fields.
The reader may optionally specify a RecordTextSchema to provide parsing and type information for fields. The schema, in conjunction with any specified field filter, defines the output type of the reader. This can be manually constructed via the API provided. StructuredSchemaReader provides support for reading in Pervasive DataIntegrator structured schema descriptors (.schema files) for use with readers. Because JSON text has explicit field markers, it is also possible to perform automated discovery of the schema based on the contents of the file. The reader provides a pluggable discovery mechanism to support this function. By default, the schema will be automatically discovered, with all fields assumed to be strings initially. Discovered fields are named using the key fields present.
Normally, the output of the reader includes all parsed records in the file, both those with and without parsing errors. Fields which can not be parsed are null valued in the resulting record. If desired, the reader can be configured to filter failed records from the output.
JSON text will does not contain a header row since the keys in a json record define the fields in the resulting output. JSON text files can be parsed in parallel under "optimistic" assumptions: namely, that the data is well formatted in JSON lines format.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_ANALYSIS_DEPTH
The default number of lines analyzed when performing schema discovery
|
encodingProps
options, output
Constructor and Description |
---|
ReadJSON()
Reads an empty source with default settings.
|
ReadJSON(ByteSource source)
Reads the specified data source using default options.
|
ReadJSON(Path path)
Reads the file specified by the path using default options.
|
ReadJSON(String pattern)
Reads all paths matching the specified pattern using default options.
|
Modifier and Type | Method and Description |
---|---|
void |
autoConfigure(FileClient ctx)
Performs any configured discovery on the operator using the current source
and applies the result to configuration.
|
ReadJSON |
clone() |
protected DataFormat |
computeFormat(CompositionContext ctx)
Determines the data format for the source.
|
RecordTextSchema<?> |
discoverSchema(FileClient ctx)
Run schema discovery using current configuration.
|
boolean |
getAllowBackslashEscapingAny()
Get whether the parser will allow quoting of all characters using backslash quoting mechanism.
|
boolean |
getAllowComments()
Get whether the parser should allow Java or C++ style comments within the source.
|
boolean |
getAllowNonNumericNumbers()
Get whether the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
|
boolean |
getAllowNumericLeadingZeros()
Get whether the parser will allow numbers to start with additional zeroes.
|
boolean |
getAllowSingleQuotes()
Get whether parser will allow use of single quotes (apostrophe, character '\'') for quoting strings.
|
boolean |
getAllowUnquotedControlChars()
Get whether the parser will allow JSON strings to contain unquoted control
characters (ASCII characters with value less than 32, including tab and line feed).
|
boolean |
getAllowUnquotedFieldNames()
Get whether the parser will allow use of unquoted field names.
|
int |
getAnalysisDepth()
Gets the number of characters to read for
schema discovery and structural analysis
of the file.
|
String |
getDiscoveryNullIndicator()
Gets the text value used to represent null values by
default in discovered schemas.
|
TextTypes.StringConversion |
getDiscoveryStringHandling()
Gets the default behavior for processing string-valued types
in discovered schemas.
|
boolean |
getMultilineFormat()
Get whether or not the parser will allow JSON records which span multiple lines
|
RecordTextSchema<?> |
getSchema()
Gets the record schema of the JSON text source.
|
TextRecordDiscoverer |
getSchemaDiscovery()
Gets the schema discoverer to use on the JSON text source.
|
void |
setAllowBackslashEscapingAny(boolean allowBackslashEscapingAny)
Set if the parser will allow quoting of all characters using backslash quoting mechanism.
|
void |
setAllowComments(boolean allowComments)
Set whether the parser should allow comments or not.
|
void |
setAllowNonNumericNumbers(boolean allowNonNumericNumbers)
Set if the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
|
void |
setAllowNumericLeadingZeros(boolean allowNumericLeadingZeros)
Sets whether the parser will allow numbers to start with additional zeroes.
|
void |
setAllowSingleQuotes(boolean allowSingleQuotes)
Set whether the parser will allow use of single quotes for quoting strings.
|
void |
setAllowUnquotedControlChars(boolean allowUnquotedControlChars)
Set if the parser will allow JSON strings to contain unquoted control
characters (ASCII characters with value less than 32, including tab and line feed).
|
void |
setAllowUnquotedFieldNames(boolean allowUnquotedFieldNames)
Set whether the parser will allow use of unquoted field names.
|
void |
setAnalysisDepth(int count)
Sets the number of characters to read for performing
schema discovery and structural analysis.
|
void |
setDiscoveryNullIndicator(String value)
Sets the text value used to represent null values
by default in discovered schemas.
|
void |
setDiscoveryStringHandling(TextTypes.StringConversion behavior)
Sets the default behavior for processing string-valued types
in discovered schemas.
|
void |
setMultilineFormat(boolean multilineFormat)
Sets whether or not the parser will allow JSON records to span multiple lines
|
void |
setSchema(RecordTextSchema<?> schema)
Sets the record schema expected in the JSON text source.
|
void |
setSchemaDiscovery(List<TypePattern> patterns)
Enables schema discovery using the default discoverer
extended with additional typing patterns.
|
void |
setSchemaDiscovery(TextRecordDiscoverer discoverer)
Sets the schema discoverer to use against the JSON text source.
|
getCharset, getCharsetName, getDecodeBuffer, getEncoding, getErrorAction, getReplacement, setCharset, setCharsetName, setDecodeBuffer, setEncoding, setErrorAction, setReplacement
compose, getExtraFieldAction, getFieldErrorAction, getFieldLengthThreshold, getIncludeSourceInfo, getMissingFieldAction, getOutput, getParseOptions, getPessimisticSplitting, getReadBuffer, getReadOnClient, getRecordWarningThreshold, getSelectedFields, getSource, getSplitOptions, getUseMetadata, setExtraFieldAction, setFieldErrorAction, setFieldLengthThreshold, setIncludeSourceInfo, setMissingFieldAction, setParseErrorAction, setParseOptions, setPessimisticSplitting, setReadBuffer, setReadOnClient, setRecordWarningThreshold, setSelectedFields, setSelectedFields, setSource, setSource, setSource, setSplitOptions, setUseMetadata
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public static final int DEFAULT_ANALYSIS_DEPTH
public ReadJSON()
A default schema discovery will be run based on
analysis of the source, unless otherwise configured via
setSchema(RecordTextSchema)
AbstractReader.setSource(ByteSource)
public ReadJSON(String pattern)
A default schema discovery will be run based on
analysis of the source, unless otherwise configured via
setSchema(RecordTextSchema)
pattern
- a path-matching patternFileClient.matchPaths(String)
public ReadJSON(Path path)
A default schema discovery will be run based on
analysis of the source, unless otherwise configured via
setSchema(RecordTextSchema)
path
- the path to readpublic ReadJSON(ByteSource source)
A default schema discovery will be run based on
analysis of the source, unless otherwise configured via
setSchema(RecordTextSchema)
source
- the data source to readpublic boolean getAllowComments()
public void setAllowComments(boolean allowComments)
allowComments
- sets whether parser will allow comments or notpublic boolean getAllowUnquotedFieldNames()
public void setAllowUnquotedFieldNames(boolean allowUnquotedFieldNames)
allowUnquotedFieldNames
- sets whether parser will allow use of unquoted field namespublic boolean getAllowSingleQuotes()
public void setAllowSingleQuotes(boolean allowSingleQuotes)
allowSingleQuotes
- sets whether parser will allow single quotes for quoting strings.public boolean getAllowNumericLeadingZeros()
public void setAllowNumericLeadingZeros(boolean allowNumericLeadingZeros)
allowNumericLeadingZeros
- sets whether parser will allow leading zerospublic boolean getAllowUnquotedControlChars()
public void setAllowUnquotedControlChars(boolean allowUnquotedControlChars)
allowUnquotedControlChars
- sets whether parser will allow unquoted control characterspublic boolean getAllowBackslashEscapingAny()
public void setAllowBackslashEscapingAny(boolean allowBackslashEscapingAny)
allowBackslashEscapingAny
- sets whether backslash escaping is allowed.public boolean getAllowNonNumericNumbers()
public void setAllowNonNumericNumbers(boolean allowNonNumericNumbers)
allowNonNumericNumbers
- sets whether non numeric numbers are allowedpublic RecordTextSchema<?> getSchema()
null
then schema discovery
will be attempted.public void setSchema(RecordTextSchema<?> schema)
Setting a schema disables schema discovery.
schema
- the expected record schema of the sourceAbstractReader.setSelectedFields(java.util.List)
public boolean getMultilineFormat()
public void setMultilineFormat(boolean multilineFormat)
multilineFormat
- sets whether multiline JSON records are allowedpublic int getAnalysisDepth()
public void setAnalysisDepth(int count)
count
- the number of characters to use to determine
the schema and/or file structurepublic TextRecordDiscoverer getSchemaDiscovery()
null
.public void setSchemaDiscovery(TextRecordDiscoverer discoverer)
By default, the schema will be discovered automatically. All fields are assumed to be strings and the field names are taken from the key values.
Setting schema discovery overrides any previously configured schema.
discoverer
- the schema discoverer to use.setSchema(RecordTextSchema)
,
AbstractReader.setSelectedFields(java.util.List)
public void setSchemaDiscovery(List<TypePattern> patterns)
setSchemaDiscovery(TextRecordDiscoverer)
with an appropriately configured discoverer instead.patterns
- the additional patterns to apply at lower
precedence than default patternsPatternBasedDiscovery
public String getDiscoveryNullIndicator()
public void setDiscoveryNullIndicator(String value)
value
- the string indicating a null valuepublic TextTypes.StringConversion getDiscoveryStringHandling()
public void setDiscoveryStringHandling(TextTypes.StringConversion behavior)
behavior
- indicates how string-valued types should be converted
from textprotected DataFormat computeFormat(CompositionContext ctx)
AbstractReader
ReadSource
operator. If an
implementation supports schema discovery, it must be
performed in this method.computeFormat
in class AbstractReader
ctx
- the composition context for the current invocation
of AbstractReader.compose(CompositionContext)
public void autoConfigure(FileClient ctx) throws IOException
ctx
- the authorization context to use for accessing the sourceIOException
- if errors occur during discovery and analysis of the sourcepublic RecordTextSchema<?> discoverSchema(FileClient ctx)
ctx
- the authorization context to use for accessing the fileCopyright © 2019 Actian Corporation. All rights reserved.