public class ParseTextFields extends AbstractExecutableRecordPipeline
This differs in
comparison to other operators such as ReadDelimitedText
which do not provide a flow of rejected records. This also
differs in that it processes an existing flow of text records
instead of reading from a source directly, meaning an upstream
operator for breaking data into individual fields is necessary.
Note that ReadDelimitedText
can be used to do this
syntactic parsing in the delimited text case by using a text
schema containing only "raw" string fields.
The parsed output will have the type specified by the schema. Output fields will contain the result of parsing the input field of the same name according to the type information in the provided schema.
Input fields referenced in the schema must either be string typed, in which case they are parsed according to the schema, or be of a type assignable to the output field type, in which case they are simply passed through. If a field is present in the schema, but not in the input, the output field is NULL. If an input value is NULL, the resulting output field is NULL.
The reject output has the same type as the input.
input, output
Constructor and Description |
---|
ParseTextFields()
Defines a parser which does no parsing.
|
ParseTextFields(RecordTextSchema<?> schema)
Defines a parser using the specified schema.
|
Modifier and Type | Method and Description |
---|---|
protected void |
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected void |
execute(ExecutionContext ctx)
Executes the operator.
|
RecordPort |
getInput()
Gets the record port providing the input data to the operation.
|
RecordPort |
getOutput()
Gets the record port providing the output from the operation.
|
RecordPort |
getRejects()
Gets the port providing records which failed parsing.
|
RecordTextSchema<?> |
getSchema()
Gets the record schema to use for parsing.
|
void |
setSchema(RecordTextSchema<?> schema)
Sets the record schema to use for parsing.
|
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutput
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public ParseTextFields()
public ParseTextFields(RecordTextSchema<?> schema)
schema
- the record schema for parsingpublic RecordPort getInput()
AbstractExecutableRecordPipeline
getInput
in interface PipelineOperator<RecordPort>
getInput
in class AbstractExecutableRecordPipeline
public RecordPort getOutput()
AbstractExecutableRecordPipeline
getOutput
in interface PipelineOperator<RecordPort>
getOutput
in class AbstractExecutableRecordPipeline
public RecordPort getRejects()
public RecordTextSchema<?> getSchema()
public void setSchema(RecordTextSchema<?> schema)
Input fields referenced in the schema must either be string typed, in which case they are parsed according to the schema, or be of a type assignable to the output field type, in which case they are simply passed through. Any schema fields not present in the input will be NULL in the output.
schema
- the record schema for parsingprotected void computeMetadata(StreamingMetadataContext ctx)
StreamingOperator
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.
RecordPort#setRequiredDataDistribution
, otherwise data will arrive in an unspecified partial distribution
.
RecordPort#getSourceDataDistribution
and RecordPort#getSourceDataOrdering
. These should be
viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must
still declare data ordering and data distribution requirements; otherwise there is no guarantee that
data will arrive sorted/distributed as required.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
AbstractModelPort#setMergeHandler
.MergeModel
is a convenient, re-usable model reducer, parameterized with
a merge-handler.
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
computeMetadata
in class StreamingOperator
ctx
- the contextprotected void execute(ExecutionContext ctx)
ExecutableOperator
execute
in class ExecutableOperator
ctx
- context in which to lookup physical ports bound to logical portsCopyright © 2016 Actian Corporation. All rights reserved.