public class GroupPairsSortedRows extends ExecutableOperator
The output is similar to an inner join of the data against itself, except that only distinct combinations are generated. These combinations are useful for comparing rows that may be duplicates, and for mining association rules.
For example, given three rows A, B, C in a key group, a join would generate all nine of the combinations shown in the matrix below. For efficiency, this operator generates only 4, 7, and 8: just the "strictly upper triangular" entries marked with a * in the matrix:
1 A with A 4* B with A 7* C with A 2 A with B 5 B with B 8* C with B 3 A with C 6 B with C 9 C with C
Combinations 1, 5, and 9 are omitted because they join a row with itself, and are thus not useful candidates when looking for duplicate rows or mining association rules.
Combinations 2, 3, and 6 are omitted because they are the same as combinations 4, 7, and 8 respectively, with field order reversed.
As with GroupSortedRows and JoinSortedRows, the input for this operator must be sorted so that values in the same key group are consecutive, and the output is sorted by the same key.
Modifier and Type | Field and Description |
---|---|
protected RecordPort |
input
The input control port.
|
protected String[] |
keys |
Constructor and Description |
---|
GroupPairsSortedRows()
Block records from a single source.
|
Modifier and Type | Method and Description |
---|---|
protected void |
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected void |
endOfData(boolean emptyInput)
Called once at the end of run.
|
protected void |
execute(ExecutionContext ctx)
Executes the operator.
|
RecordPort |
getInput() |
String[] |
getKeys() |
String |
getLeftFieldPattern()
Gets the output naming pattern for fields on the left hand side of
the pair.
|
protected int |
getNumInputCopies(LogicalPort inputPort)
May be overridden to specify that multiple input copies are needed for a given
input port.
|
RecordPort |
getOutput()
Gets the record port providing the results of the pair generation.
|
String |
getRightFieldPattern()
Gets the output naming pattern for fields on the right hand side of
the pair.
|
protected void |
handleRow(boolean endOfGroup)
Called once per input row.
|
protected RecordInput |
nextKey(ExecutionContext ctx) |
protected RecordInput |
recordsIn(ExecutionContext ctx) |
void |
setKeys(String[] keys) |
void |
setLeftFieldPattern(String pattern)
Sets the output naming pattern for fields on the left hand side of
pairs.
|
void |
setRightFieldPattern(String pattern)
Sets the output naming pattern for fields on the right hand side of
pairs.
|
cloneForExecution, getPortSettings, handleInactiveOutput
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
protected final RecordPort input
protected String[] keys
public GroupPairsSortedRows()
setKeys(String[])
public RecordPort getOutput()
public String getLeftFieldPattern()
public void setLeftFieldPattern(String pattern)
pattern
- name pattern for the left hand side field namespublic String getRightFieldPattern()
public void setRightFieldPattern(String pattern)
pattern
- name pattern for the eight hand side field namesprotected void computeMetadata(StreamingMetadataContext ctx)
StreamingOperator
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.
RecordPort#setRequiredDataDistribution
, otherwise data will arrive in an unspecified partial distribution
.
RecordPort#getSourceDataDistribution
and RecordPort#getSourceDataOrdering
. These should be
viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must
still declare data ordering and data distribution requirements; otherwise there is no guarantee that
data will arrive sorted/distributed as required.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
AbstractModelPort#setMergeHandler
.MergeModel
is a convenient, re-usable model reducer, parameterized with
a merge-handler.
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
ctx
- the contextprotected void execute(ExecutionContext ctx)
ExecutableOperator
ctx
- context in which to lookup physical ports bound to logical portsprotected void handleRow(boolean endOfGroup)
endOfGroup
- true iff the input row is the last in the key groupprotected void endOfData(boolean emptyInput)
emptyInput
- true iff handleRow was called zero times (no input rows to aggregate)public RecordPort getInput()
public String[] getKeys()
public void setKeys(String[] keys)
protected final RecordInput recordsIn(ExecutionContext ctx)
protected final RecordInput nextKey(ExecutionContext ctx)
protected int getNumInputCopies(LogicalPort inputPort)
ExecutableOperator
getNumInputCopies
in class ExecutableOperator
inputPort
- the portCopyright © 2020 Actian Corporation. All rights reserved.