public class SplitField extends AbstractExecutableRecordPipeline
The SplitField operator has three properties:
The contents of the split field will be split using the defined split pattern, resulting in an array of substrings. The key of the result mapping corresponds to an index within this array, and the associated value defines the output field in which to place the substring.
For example, if you had a record with a field named time containing times in the
format of 18:30:00, you could use the following SplitField operator to split the time
into hour, minute, and second fields.
HashMap<Integer,String> map = new HashMap<Integer,String>();
map.put(0,"hour");
map.put(1,"minute");
map.put(2,"second");
SplitField splitter = new SplitField("time",":",map);
input, output| Constructor and Description |
|---|
SplitField()
Construct the operator with no properties set.
|
SplitField(String splitField,
String splitPattern,
Map<Integer,String> resultMapping)
Construct the operator while setting each property.
|
| Modifier and Type | Method and Description |
|---|---|
void |
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected void |
execute(ExecutionContext ctx)
Executes the operator.
|
RecordPort |
getInput()
Gets the record port providing the input data to the operation.
|
RecordPort |
getOutput()
Gets the record port providing the output from the operation.
|
Map<Integer,String> |
getResultMapping()
Get the mapping of split indices to output field names.
|
String |
getSplitField()
Get the string field to be split.
|
String |
getSplitPattern()
Get the splitting pattern.
|
void |
setResultMapping(Map<Integer,String> resultMapping)
Set the mapping of split indices to output field names.
|
void |
setSplitField(String splitField)
Set the string field to be split.
|
void |
setSplitPattern(String splitPattern)
Set the splitting pattern.
|
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutputdisableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdisableParallelism, getInputPorts, getOutputPortspublic SplitField()
The split pattern defaults to whitespace. The other properties (split field and result mapping) must be set manually.
public SplitField(String splitField, String splitPattern, Map<Integer,String> resultMapping)
splitField - The name of the field to be split.splitPattern - The splitting pattern.resultMapping - The mapping of split indices to output field names.setSplitField(String),
setSplitPattern(String),
setResultMapping(Map)public RecordPort getInput()
AbstractExecutableRecordPipelinegetInput in interface PipelineOperator<RecordPort>getInput in class AbstractExecutableRecordPipelinepublic RecordPort getOutput()
AbstractExecutableRecordPipelinegetOutput in interface PipelineOperator<RecordPort>getOutput in class AbstractExecutableRecordPipelinepublic void setSplitField(String splitField)
If this field does not exist in the input, or is not of type String, an exception will be thrown at composition time.
splitField - The name of the field to be split.public String getSplitField()
public void setSplitPattern(String splitPattern)
The pattern should be expressed as a regular expression. The default value matches any whitespace.
splitPattern - The splitting pattern.com.pervasive.datarush.graphs.physical.InvalidPropertyValueException - If the given pattern is not a valid regular expression.String.split(String)public String getSplitPattern()
public void setResultMapping(Map<Integer,String> resultMapping)
The key of each entry represents an index in the array resulting from splitting the input string, and the value represents the name of the output field in which to store that substring.
It is not necessary for every array index to be mapped, or for every mapped index to exist in each split. If a value does not exist at a mapped index for a particular split, an empty string will be placed in the specified output field.
If an output field already exists in the input, or if a single output field is mapped to multiple indices, an exception will be thrown at composition time.
resultMapping - The mapping of indices to field names.public Map<Integer,String> getResultMapping()
setResultMapping(Map)public void computeMetadata(StreamingMetadataContext ctx)
StreamingOperator
StreamingMetadataContext.parallelize(ParallelismStrategy).
RecordPort#setRequiredDataOrdering, otherwise data may arrive in any order.
RecordPort#setRequiredDataDistribution, otherwise data will arrive in an unspecified partial distribution.
RecordPort#getSourceDataDistribution and RecordPort#getSourceDataOrdering. These should be
viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must
still declare data ordering and data distribution requirements; otherwise there is no guarantee that
data will arrive sorted/distributed as required.
RecordPort#setType.RecordPort#setOutputDataOrderingRecordPort#setOutputDataDistributionAbstractModelPort#setMergeHandler.MergeModel is a convenient, re-usable model reducer, parameterized with
a merge-handler.
SimpleModelPort's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec.
computeMetadata in class StreamingOperatorctx - the contextprotected void execute(ExecutionContext ctx)
ExecutableOperatorexecute in class ExecutableOperatorctx - context in which to lookup physical ports bound to logical portsCopyright © 2020 Actian Corporation. All rights reserved.