public class SplitField extends AbstractExecutableRecordPipeline
The SplitField operator has three properties:
The contents of the split field will be split using the defined split pattern, resulting in an array of substrings. The key of the result mapping corresponds to an index within this array, and the associated value defines the output field in which to place the substring.
For example, if you had a record with a field named time
containing times in the
format of 18:30:00
, you could use the following SplitField operator to split the time
into hour
, minute
, and second
fields.
HashMap<Integer,String> map = new HashMap<Integer,String>();
map.put(0,"hour");
map.put(1,"minute");
map.put(2,"second");
SplitField splitter = new SplitField("time",":",map);
input, output
Constructor and Description |
---|
SplitField()
Construct the operator with no properties set.
|
SplitField(String splitField,
String splitPattern,
Map<Integer,String> resultMapping)
Construct the operator while setting each property.
|
Modifier and Type | Method and Description |
---|---|
void |
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected void |
execute(ExecutionContext ctx)
Executes the operator.
|
RecordPort |
getInput()
Gets the record port providing the input data to the operation.
|
RecordPort |
getOutput()
Gets the record port providing the output from the operation.
|
Map<Integer,String> |
getResultMapping()
Get the mapping of split indices to output field names.
|
String |
getSplitField()
Get the string field to be split.
|
String |
getSplitPattern()
Get the splitting pattern.
|
void |
setResultMapping(Map<Integer,String> resultMapping)
Set the mapping of split indices to output field names.
|
void |
setSplitField(String splitField)
Set the string field to be split.
|
void |
setSplitPattern(String splitPattern)
Set the splitting pattern.
|
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutput
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public SplitField()
The split pattern defaults to whitespace. The other properties (split field and result mapping) must be set manually.
public SplitField(String splitField, String splitPattern, Map<Integer,String> resultMapping)
splitField
- The name of the field to be split.splitPattern
- The splitting pattern.resultMapping
- The mapping of split indices to output field names.setSplitField(String)
,
setSplitPattern(String)
,
setResultMapping(Map)
public RecordPort getInput()
AbstractExecutableRecordPipeline
getInput
in interface PipelineOperator<RecordPort>
getInput
in class AbstractExecutableRecordPipeline
public RecordPort getOutput()
AbstractExecutableRecordPipeline
getOutput
in interface PipelineOperator<RecordPort>
getOutput
in class AbstractExecutableRecordPipeline
public void setSplitField(String splitField)
If this field does not exist in the input, or is not of type String, an exception will be thrown at composition time.
splitField
- The name of the field to be split.public String getSplitField()
public void setSplitPattern(String splitPattern)
The pattern should be expressed as a regular expression. The default value matches any whitespace.
splitPattern
- The splitting pattern.com.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- If the given pattern is not a valid regular expression.String.split(String)
public String getSplitPattern()
public void setResultMapping(Map<Integer,String> resultMapping)
The key of each entry represents an index in the array resulting from splitting the input string, and the value represents the name of the output field in which to store that substring.
It is not necessary for every array index to be mapped, or for every mapped index to exist in each split. If a value does not exist at a mapped index for a particular split, an empty string will be placed in the specified output field.
If an output field already exists in the input, or if a single output field is mapped to multiple indices, an exception will be thrown at composition time.
resultMapping
- The mapping of indices to field names.public Map<Integer,String> getResultMapping()
setResultMapping(Map)
public void computeMetadata(StreamingMetadataContext ctx)
StreamingOperator
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.
RecordPort#setRequiredDataDistribution
, otherwise data will arrive in an unspecified partial distribution
.
RecordPort#getSourceDataDistribution
and RecordPort#getSourceDataOrdering
. These should be
viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must
still declare data ordering and data distribution requirements; otherwise there is no guarantee that
data will arrive sorted/distributed as required.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
AbstractModelPort#setMergeHandler
.MergeModel
is a convenient, re-usable model reducer, parameterized with
a merge-handler.
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
computeMetadata
in class StreamingOperator
ctx
- the contextprotected void execute(ExecutionContext ctx)
ExecutableOperator
execute
in class ExecutableOperator
ctx
- context in which to lookup physical ports bound to logical portsCopyright © 2020 Actian Corporation. All rights reserved.