ReadJSON (Dataflow Library Distribution Project 6.8.0-1 API)

java.lang.Object
- com.pervasive.datarush.operators.AbstractLogicalOperator
- - com.pervasive.datarush.operators.CompositeOperator
  - - com.pervasive.datarush.operators.io.AbstractReader
    - - com.pervasive.datarush.operators.io.textfile.AbstractTextReader
      - com.pervasive.datarush.operators.io.textfile.ReadJSON

All Implemented Interfaces:

LogicalOperator, RecordSourceOperator, SourceOperator<RecordPort>
```
public class ReadJSON
extends AbstractTextReader
```
The ReadJSON operator reads a JSON file of key-value pairs or array of objects as record tokens. It supports JSON lines format as described at http://jsonlines.org/. JSON lines formatted text has a single JSON record per line with each record separated by a newline separator character
In JSON it is expected that all field keys start and end with a delimiter. A "\"" (double quote) is typically used as the field delimiter. However, the user may enable the property allowSingleQuotes to avoid parsing errors when single quotes are used instead. This operator uses the Jackson JSON parsing library to parse fields.
The reader may optionally specify a RecordTextSchema to provide parsing and type information for fields. The schema, in conjunction with any specified field filter, defines the output type of the reader. This can be manually constructed via the API provided. StructuredSchemaReader provides support for reading in Pervasive DataIntegrator structured schema descriptors (.schema files) for use with readers. Because JSON text has explicit field markers, it is also possible to perform automated discovery of the schema based on the contents of the file. The reader provides a pluggable discovery mechanism to support this function. By default, the schema will be automatically discovered, with all fields assumed to be strings initially. Discovered fields are named using the key fields present.
Normally, the output of the reader includes all parsed records in the file, both those with and without parsing errors. Fields which can not be parsed are null valued in the resulting record. If desired, the reader can be configured to filter failed records from the output.
JSON text will does not contain a header row since the keys in a json record define the fields in the resulting output. JSON text files can be parsed in parallel under "optimistic" assumptions: namely, that the data is well formatted in JSON lines format.

Field Summary

Fields
Modifier and Type Field and Description

static int DEFAULT_ANALYSIS_DEPTH
The default number of lines analyzed when performing schema discovery
- Fields inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
  encodingProps
- Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader
  options, output

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_ANALYSIS_DEPTH` The default number of lines analyzed when performing schema discovery

Constructor Summary

Constructors
Constructor and Description
`ReadJSON()` Reads an empty source with default settings.
`ReadJSON(ByteSource source)` Reads the specified data source using default options.
`ReadJSON(Path path)` Reads the file specified by the path using default options.
`ReadJSON(String pattern)` Reads all paths matching the specified pattern using default options.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`autoConfigure(FileClient ctx)` Performs any configured discovery on the operator using the current source and applies the result to configuration.
`ReadJSON`	`clone()`
`protected DataFormat`	`computeFormat(CompositionContext ctx)` Determines the data format for the source.
`RecordTextSchema<?>`	`discoverSchema(FileClient ctx)` Run schema discovery using current configuration.
`boolean`	`getAllowBackslashEscapingAny()` Get whether the parser will allow quoting of all characters using backslash quoting mechanism.
`boolean`	`getAllowComments()` Get whether the parser should allow Java or C++ style comments within the source.
`boolean`	`getAllowNonNumericNumbers()` Get whether the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
`boolean`	`getAllowNumericLeadingZeros()` Get whether the parser will allow numbers to start with additional zeroes.
`boolean`	`getAllowSingleQuotes()` Get whether parser will allow use of single quotes (apostrophe, character '\'') for quoting strings.
`boolean`	`getAllowUnquotedControlChars()` Get whether the parser will allow JSON strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed).
`boolean`	`getAllowUnquotedFieldNames()` Get whether the parser will allow use of unquoted field names.
`int`	`getAnalysisDepth()` Gets the number of characters to read for schema discovery and structural analysis of the file.
`String`	`getDiscoveryNullIndicator()` Gets the text value used to represent null values by default in discovered schemas.
`TextTypes.StringConversion`	`getDiscoveryStringHandling()` Gets the default behavior for processing string-valued types in discovered schemas.
`boolean`	`getMultilineFormat()` Get whether or not the parser will allow JSON records which span multiple lines
`RecordTextSchema<?>`	`getSchema()` Gets the record schema of the JSON text source.
`TextRecordDiscoverer`	`getSchemaDiscovery()` Gets the schema discoverer to use on the JSON text source.
`void`	`setAllowBackslashEscapingAny(boolean allowBackslashEscapingAny)` Set if the parser will allow quoting of all characters using backslash quoting mechanism.
`void`	`setAllowComments(boolean allowComments)` Set whether the parser should allow comments or not.
`void`	`setAllowNonNumericNumbers(boolean allowNonNumericNumbers)` Set if the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
`void`	`setAllowNumericLeadingZeros(boolean allowNumericLeadingZeros)` Sets whether the parser will allow numbers to start with additional zeroes.
`void`	`setAllowSingleQuotes(boolean allowSingleQuotes)` Set whether the parser will allow use of single quotes for quoting strings.
`void`	`setAllowUnquotedControlChars(boolean allowUnquotedControlChars)` Set if the parser will allow JSON strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed).
`void`	`setAllowUnquotedFieldNames(boolean allowUnquotedFieldNames)` Set whether the parser will allow use of unquoted field names.
`void`	`setAnalysisDepth(int count)` Sets the number of characters to read for performing schema discovery and structural analysis.
`void`	`setDiscoveryNullIndicator(String value)` Sets the text value used to represent null values by default in discovered schemas.
`void`	`setDiscoveryStringHandling(TextTypes.StringConversion behavior)` Sets the default behavior for processing string-valued types in discovered schemas.
`void`	`setMultilineFormat(boolean multilineFormat)` Sets whether or not the parser will allow JSON records to span multiple lines
`void`	`setSchema(RecordTextSchema<?> schema)` Sets the record schema expected in the JSON text source.
`void`	`setSchemaDiscovery(List<TypePattern> patterns)` Enables schema discovery using the default discoverer extended with additional typing patterns.
`void`	`setSchemaDiscovery(TextRecordDiscoverer discoverer)` Sets the schema discoverer to use against the JSON text source.

Methods inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader
getCharset, getCharsetName, getDecodeBuffer, getEncoding, getErrorAction, getReplacement, setCharset, setCharsetName, setDecodeBuffer, setEncoding, setErrorAction, setReplacement

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts

- Field Detail
  - DEFAULT_ANALYSIS_DEPTH
```
public static final int DEFAULT_ANALYSIS_DEPTH
```
    The default number of lines analyzed when performing schema discovery
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - ReadJSON
```
public ReadJSON()
```
    Reads an empty source with default settings. The source must be set before execution or an error will be raised.
    A default schema discovery will be run based on analysis of the source, unless otherwise configured via setSchema(RecordTextSchema)
    
    See Also:
    
    AbstractReader.setSource(ByteSource)
  - ReadJSON
```
public ReadJSON(String pattern)
```
    Reads all paths matching the specified pattern using default options. Any matching path which is a directory is replaced with all files in the directory.
    A default schema discovery will be run based on analysis of the source, unless otherwise configured via setSchema(RecordTextSchema)
    
    Parameters:
    
    pattern - a path-matching pattern
    
    See Also:
    
    FileClient.matchPaths(String)
  - ReadJSON
```
public ReadJSON(Path path)
```
    Reads the file specified by the path using default options. If the path refers to a directory, all files in the directory are read.
    A default schema discovery will be run based on analysis of the source, unless otherwise configured via setSchema(RecordTextSchema)
    
    Parameters:
    
    path - the path to read
  - ReadJSON
```
public ReadJSON(ByteSource source)
```
    Reads the specified data source using default options.
    A default schema discovery will be run based on analysis of the source, unless otherwise configured via setSchema(RecordTextSchema)
    
    Parameters:
    
    source - the data source to read
- Method Detail
  - clone
```
public ReadJSON clone()
```
    Overrides:
    
    clone in class Object
  - getAllowComments
```
public boolean getAllowComments()
```
    Get whether the parser should allow Java or C++ style comments within the source.
    
    Returns:
    
    the allowComments
  - setAllowComments
```
public void setAllowComments(boolean allowComments)
```
    Set whether the parser should allow comments or not. If the JSON file to be parsed has comments, parser should be set to true to handle comments while parsing. If enabled the parser will allow use of Java or C++ style comments (both '/'+'*' and '//' types) within parsed content or not.
    
    Parameters:
    
    allowComments - sets whether parser will allow comments or not
  - getAllowUnquotedFieldNames
```
public boolean getAllowUnquotedFieldNames()
```
    Get whether the parser will allow use of unquoted field names.
    
    Returns:
    
    the allowUnquotedFieldNames
  - setAllowUnquotedFieldNames
```
public void setAllowUnquotedFieldNames(boolean allowUnquotedFieldNames)
```
    Set whether the parser will allow use of unquoted field names. If unquoted field names are used in source file, this field should be set to true.
    
    Parameters:
    
    allowUnquotedFieldNames - sets whether parser will allow use of unquoted field names
  - getAllowSingleQuotes
```
public boolean getAllowSingleQuotes()
```
    Get whether parser will allow use of single quotes (apostrophe, character '\'') for quoting strings.
    
    Returns:
    
    the allowSingleQuotes
  - setAllowSingleQuotes
```
public void setAllowSingleQuotes(boolean allowSingleQuotes)
```
    Set whether the parser will allow use of single quotes for quoting strings. If single quotes are used in source, this field should be set to true.
    
    Parameters:
    
    allowSingleQuotes - sets whether parser will allow single quotes for quoting strings.
  - getAllowNumericLeadingZeros
```
public boolean getAllowNumericLeadingZeros()
```
    Get whether the parser will allow numbers to start with additional zeroes.
    
    Returns:
    
    the allowNumericLeadingZeros
  - setAllowNumericLeadingZeros
```
public void setAllowNumericLeadingZeros(boolean allowNumericLeadingZeros)
```
    Sets whether the parser will allow numbers to start with additional zeroes. If leading zeroes are allowed for numbers in the source, this field should be set to true.
    
    Parameters:
    
    allowNumericLeadingZeros - sets whether parser will allow leading zeros
  - getAllowUnquotedControlChars
```
public boolean getAllowUnquotedControlChars()
```
    Get whether the parser will allow JSON strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed).
    
    Returns:
    
    the allowUnquotedControlChars
  - setAllowUnquotedControlChars
```
public void setAllowUnquotedControlChars(boolean allowUnquotedControlChars)
```
    Set if the parser will allow JSON strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed).
    
    Parameters:
    
    allowUnquotedControlChars - sets whether parser will allow unquoted control characters
  - getAllowBackslashEscapingAny
```
public boolean getAllowBackslashEscapingAny()
```
    Get whether the parser will allow quoting of all characters using backslash quoting mechanism. If not enabled, only characters that are explicitly listed by JSON specification can be escaped.
    
    Returns:
    
    the allowBackslashEscapingAny
  - setAllowBackslashEscapingAny
```
public void setAllowBackslashEscapingAny(boolean allowBackslashEscapingAny)
```
    Set if the parser will allow quoting of all characters using backslash quoting mechanism.
    
    Parameters:
    
    allowBackslashEscapingAny - sets whether backslash escaping is allowed.
  - getAllowNonNumericNumbers
```
public boolean getAllowNonNumericNumbers()
```
    Get whether the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
    
    Returns:
    
    the allowNonNumericNumbers
  - setAllowNonNumericNumbers
```
public void setAllowNonNumericNumbers(boolean allowNonNumericNumbers)
```
    Set if the parser recognizes set of "Not a Number" (NaN) tokens as legal floating number values
    
    Parameters:
    
    allowNonNumericNumbers - sets whether non numeric numbers are allowed
  - getSchema
```
public RecordTextSchema<?> getSchema()
```
    Gets the record schema of the JSON text source. If this returns null then schema discovery will be attempted.
    
    Returns:
    
    the record schema of the source
  - setSchema
```
public void setSchema(RecordTextSchema<?> schema)
```
    Sets the record schema expected in the JSON text source. Output records will have this schema, adjusted accordingly for any configured field selection.
    Setting a schema disables schema discovery.
    
    Parameters:
    
    schema - the expected record schema of the source
    
    See Also:
    
    AbstractReader.setSelectedFields(java.util.List)
  - getMultilineFormat
```
public boolean getMultilineFormat()
```
    Get whether or not the parser will allow JSON records which span multiple lines
    
    Returns:
    
    the multilineFormat
  - setMultilineFormat
```
public void setMultilineFormat(boolean multilineFormat)
```
    Sets whether or not the parser will allow JSON records to span multiple lines
    
    Parameters:
    
    multilineFormat - sets whether multiline JSON records are allowed
  - getAnalysisDepth
```
public int getAnalysisDepth()
```
    Gets the number of characters to read for schema discovery and structural analysis of the file.
    
    Returns:
    
    the number of characters which will be analyzed
  - setAnalysisDepth
```
public void setAnalysisDepth(int count)
```
    Sets the number of characters to read for performing schema discovery and structural analysis. This setting is ignored if no discovery is being performed. The default setting is 1M characters.
    
    Parameters:
    
    count - the number of characters to use to determine the schema and/or file structure
  - getSchemaDiscovery
```
public TextRecordDiscoverer getSchemaDiscovery()
```
    Gets the schema discoverer to use on the JSON text source. If schema discovery is disabled, this will return null.
    
    Returns:
    
    the configured schema discoverer
  - setSchemaDiscovery
```
public void setSchemaDiscovery(TextRecordDiscoverer discoverer)
```
    Sets the schema discoverer to use against the JSON text source. Just prior to graph execution the source will be examined using the discoverer to determine the output schema for records. If reading multiple files, the schema is determined using the first file. Output records will have the discovered schema, adjusted accordingly for any configured field selection.
    By default, the schema will be discovered automatically. All fields are assumed to be strings and the field names are taken from the key values.
    Setting schema discovery overrides any previously configured schema.
    
    Parameters:
    
    discoverer - the schema discoverer to use.
    
    See Also:
    
    setSchema(RecordTextSchema), AbstractReader.setSelectedFields(java.util.List)
  - setSchemaDiscovery
```
public void setSchemaDiscovery(List<TypePattern> patterns)
```
    Enables schema discovery using the default discoverer extended with additional typing patterns. The additional patterns are in addition to, not in place of, the normal discovery typing patterns. If overriding default rules is desired, use setSchemaDiscovery(TextRecordDiscoverer) with an appropriately configured discoverer instead.
    
    Parameters:
    
    patterns - the additional patterns to apply at lower precedence than default patterns
    
    See Also:
    
    PatternBasedDiscovery
  - getDiscoveryNullIndicator
```
public String getDiscoveryNullIndicator()
```
    Gets the text value used to represent null values by default in discovered schemas.
    
    Returns:
    
    the string indicating a null value
  - setDiscoveryNullIndicator
```
public void setDiscoveryNullIndicator(String value)
```
    Sets the text value used to represent null values by default in discovered schemas. By default, this is the empty string. If schema discovery is not enabled, this setting is ignored.
    
    Parameters:
    
    value - the string indicating a null value
  - getDiscoveryStringHandling
```
public TextTypes.StringConversion getDiscoveryStringHandling()
```
    Gets the default behavior for processing string-valued types in discovered schemas.
    
    Returns:
    
    how string-valued types should be converted from text
  - setDiscoveryStringHandling
```
public void setDiscoveryStringHandling(TextTypes.StringConversion behavior)
```
    Sets the default behavior for processing string-valued types in discovered schemas. By default, whitespace is not trimmed from values and the empty string is treated as null. If schema discovery is not enabled, this setting is ignored.
    
    Parameters:
    
    behavior - indicates how string-valued types should be converted from text
  - computeFormat
```
protected DataFormat computeFormat(CompositionContext ctx)
```
    Description copied from class: AbstractReader
    
    Determines the data format for the source. The returned format is used during composition to construct a ReadSource operator. If an implementation supports schema discovery, it must be performed in this method.
    
    Specified by:
    
    computeFormat in class AbstractReader
    
    Parameters:
    
    ctx - the composition context for the current invocation of AbstractReader.compose(CompositionContext)
    
    Returns:
    
    the source format to use
  - autoConfigure
```
public void autoConfigure(FileClient ctx)
                   throws IOException
```
    Performs any configured discovery on the operator using the current source and applies the result to configuration. After execution, the operator will be configured so not to require any pre-execution analysis of the source. In particular:
    - A schema will be discovered, if necessary, and set; schema discovery will subsequently be disabled.
    Parameters:
    
    ctx - the authorization context to use for accessing the source
    
    Throws:
    
    IOException - if errors occur during discovery and analysis of the source
  - discoverSchema
```
public RecordTextSchema<?> discoverSchema(FileClient ctx)
```
    Run schema discovery using current configuration.
    
    Parameters:
    
    ctx - the authorization context to use for accessing the file
    
    Returns:
    
    the predicted schema of the source

Class ReadJSON

Field Summary

Fields inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader

Fields inherited from class com.pervasive.datarush.operators.io.AbstractReader

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.io.textfile.AbstractTextReader

Methods inherited from class com.pervasive.datarush.operators.io.AbstractReader

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator

Field Detail

DEFAULT_ANALYSIS_DEPTH

Constructor Detail

ReadJSON

ReadJSON

ReadJSON

ReadJSON

Method Detail

clone

getAllowComments

setAllowComments

getAllowUnquotedFieldNames

setAllowUnquotedFieldNames

getAllowSingleQuotes

setAllowSingleQuotes

getAllowNumericLeadingZeros

setAllowNumericLeadingZeros

getAllowUnquotedControlChars

setAllowUnquotedControlChars

getAllowBackslashEscapingAny

setAllowBackslashEscapingAny

getAllowNonNumericNumbers

setAllowNonNumericNumbers

getSchema

setSchema

getMultilineFormat

setMultilineFormat

getAnalysisDepth

setAnalysisDepth

getSchemaDiscovery

setSchemaDiscovery

setSchemaDiscovery

getDiscoveryNullIndicator

setDiscoveryNullIndicator

getDiscoveryStringHandling

setDiscoveryStringHandling

computeFormat

autoConfigure

discoverSchema