public class ClusterLinks extends CompositeOperator
DiscoverDuplicates
operator is a stream of
record pairs. Each pair of records has passed the given qualifications for being
a potential match. This operator takes the record pair input and finds clusters
of records that are alike. For example, a row contains records A and B, another
contains records B and C. This operator will create a cluster for records A, B
and C, generate a unique cluster identifier for the grouping and output a row
for records A, B and C with the generated cluster identifier.
A cluster may contain any number of records. Note that the original record pairings are lost as are the scores.
Constructor and Description |
---|
ClusterLinks()
Cluster record pairs use default record id field names
of "id" and default left/right field patterns.
|
ClusterLinks(String leftDataIdField,
String rightDataIdField)
Cluster record pairs using the specified record id field
names and default left/right field patterns.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
RecordPort |
getInput()
Gets the record port providing the input to the clustering operation.
|
String |
getLeftDataIdField()
Gets the name of the field uniquely identifying records on the left hand side
of the pairs.
|
String |
getLeftFieldPattern()
Gets the naming pattern used to determine the actual name of the
left hand id field.
|
RecordPort |
getOutput()
Gets the record port providing the results of the clustering operation.
|
String |
getRightDataIdField()
Gets the name of the field uniquely identifying records on the right hand side
of the pairs.
|
String |
getRightFieldPattern()
Gets the naming pattern used to determine the actual name of the
right hand id field.
|
void |
setDataIdField(String name)
Sets the name of the field uniquely identifying records on the both sides
of the pairs.
|
void |
setLeftDataIdField(String name)
Sets the name of the field uniquely identifying records on the left hand side
of the pairs.
|
void |
setLeftFieldPattern(String pattern)
Sets the naming pattern used for fields from the left hand side record.
|
void |
setRightDataIdField(String name)
Sets the name of the field uniquely identifying records on the right hand side
of the pairs.
|
void |
setRightFieldPattern(String pattern)
Sets the naming pattern used for fields from the right hand side record.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public ClusterLinks()
setLeftDataIdField(String)
and
setRightDataIdField(String)
to change these as
necessary.public ClusterLinks(String leftDataIdField, String rightDataIdField)
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
leftDataIdField
- field uniquely identifying records on left hand siderightDataIdField
- field uniquely identifying records on right hand sidepublic RecordPort getInput()
public RecordPort getOutput()
public void setDataIdField(String name)
DiscoverDuplicates
.
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
name
- the field uniquely identifying records on both the left and right hand side
of pairspublic void setLeftFieldPattern(String pattern)
pattern
- name pattern for left hand side fieldssetLeftDataIdField(String)
public String getLeftFieldPattern()
getLeftDataIdField()
public void setLeftDataIdField(String name)
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
name
- the field uniquely identifying records on the left hand side of pairspublic String getLeftDataIdField()
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
public void setRightFieldPattern(String pattern)
pattern
- name pattern for right hand side fieldssetRightDataIdField(String)
public String getRightFieldPattern()
getRightDataIdField()
public void setRightDataIdField(String name)
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
name
- the field uniquely identifying records on the right hand side of pairspublic String getRightDataIdField()
This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data.
protected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2020 Actian Corporation. All rights reserved.