- java.lang.Object
- 
- com.pervasive.datarush.operators.AbstractLogicalOperator
- 
- com.pervasive.datarush.operators.CompositeOperator
- 
- com.pervasive.datarush.matching.cluster.ClusterLinks
 
 
 
- 
- All Implemented Interfaces:
- LogicalOperator
 
 public class ClusterLinks extends CompositeOperator Transform record pairs into clusters of like records. The output of theDiscoverDuplicatesoperator is a stream of record pairs. Each pair of records has passed the given qualifications for being a potential match. This operator takes the record pair input and finds clusters of records that are alike. For example, a row contains records A and B, another contains records B and C. This operator will create a cluster for records A, B and C, generate a unique cluster identifier for the grouping and output a row for records A, B and C with the generated cluster identifier.A cluster may contain any number of records. Note that the original record pairings are lost as are the scores. 
- 
- 
Constructor SummaryConstructors Constructor Description ClusterLinks()Cluster record pairs use default record id field names of "id" and default left/right field patterns.ClusterLinks(String leftDataIdField, String rightDataIdField)Cluster record pairs using the specified record id field names and default left/right field patterns.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcompose(CompositionContext ctx)Compose the body of this operator.RecordPortgetInput()Gets the record port providing the input to the clustering operation.StringgetLeftDataIdField()Gets the name of the field uniquely identifying records on the left hand side of the pairs.StringgetLeftFieldPattern()Gets the naming pattern used to determine the actual name of the left hand id field.RecordPortgetOutput()Gets the record port providing the results of the clustering operation.StringgetRightDataIdField()Gets the name of the field uniquely identifying records on the right hand side of the pairs.StringgetRightFieldPattern()Gets the naming pattern used to determine the actual name of the right hand id field.voidsetDataIdField(String name)Sets the name of the field uniquely identifying records on the both sides of the pairs.voidsetLeftDataIdField(String name)Sets the name of the field uniquely identifying records on the left hand side of the pairs.voidsetLeftFieldPattern(String pattern)Sets the naming pattern used for fields from the left hand side record.voidsetRightDataIdField(String name)Sets the name of the field uniquely identifying records on the right hand side of the pairs.voidsetRightFieldPattern(String pattern)Sets the naming pattern used for fields from the right hand side record.- 
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperatordisableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
 
- 
 
- 
- 
- 
Constructor Detail- 
ClusterLinkspublic ClusterLinks() Cluster record pairs use default record id field names of "id" and default left/right field patterns. UsesetLeftDataIdField(String)andsetRightDataIdField(String)to change these as necessary.
 - 
ClusterLinkspublic ClusterLinks(String leftDataIdField, String rightDataIdField) Cluster record pairs using the specified record id field names and default left/right field patterns.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Parameters:
- leftDataIdField- field uniquely identifying records on left hand side
- rightDataIdField- field uniquely identifying records on right hand side
 
 
- 
 - 
Method Detail- 
getInputpublic RecordPort getInput() Gets the record port providing the input to the clustering operation.- Returns:
- the input port for the operation
 
 - 
getOutputpublic RecordPort getOutput() Gets the record port providing the results of the clustering operation.- Returns:
- the output port for the operation
 
 - 
setDataIdFieldpublic void setDataIdField(String name) Sets the name of the field uniquely identifying records on the both sides of the pairs. This is a convenience mechanism for when both sides use the same name, such as is the case with the output fromDiscoverDuplicates.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Parameters:
- name- the field uniquely identifying records on both the left and right hand side of pairs
 
 - 
setLeftFieldPatternpublic void setLeftFieldPattern(String pattern) Sets the naming pattern used for fields from the left hand side record. This will be used to determine the actual name of the left hand id field.- Parameters:
- pattern- name pattern for left hand side fields
- See Also:
- setLeftDataIdField(String)
 
 - 
getLeftFieldPatternpublic String getLeftFieldPattern() Gets the naming pattern used to determine the actual name of the left hand id field.- Returns:
- the name pattern for left hand side fields
- See Also:
- getLeftDataIdField()
 
 - 
setLeftDataIdFieldpublic void setLeftDataIdField(String name) Sets the name of the field uniquely identifying records on the left hand side of the pairs. This name will also be used to identify cluster members in the output.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Parameters:
- name- the field uniquely identifying records on the left hand side of pairs
 
 - 
getLeftDataIdFieldpublic String getLeftDataIdField() Gets the name of the field uniquely identifying records on the left hand side of the pairs. This name is also used to identify cluster members in the output.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Returns:
- the field uniquely identifying records on the left hand side of pairs
 
 - 
setRightFieldPatternpublic void setRightFieldPattern(String pattern) Sets the naming pattern used for fields from the right hand side record. This will be used to determine the actual name of the right hand id field.- Parameters:
- pattern- name pattern for right hand side fields
- See Also:
- setRightDataIdField(String)
 
 - 
getRightFieldPatternpublic String getRightFieldPattern() Gets the naming pattern used to determine the actual name of the right hand id field.- Returns:
- the name pattern for right hand side fields
- See Also:
- getRightDataIdField()
 
 - 
setRightDataIdFieldpublic void setRightDataIdField(String name) Sets the name of the field uniquely identifying records on the right hand side of the pairs.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Parameters:
- name- the field uniquely identifying records on the right hand side of pairs
 
 - 
getRightDataIdFieldpublic String getRightDataIdField() Gets the name of the field uniquely identifying records on the right hand side of the pairs.This name is the one used in the original record data producing the pairs, not the formatted name used in the input pair data. - Returns:
- the field uniquely identifying records on the right hand side of pairs
 
 - 
composeprotected void compose(CompositionContext ctx) Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
    the method OperatorComposable.add(O)
- Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
 - Specified by:
- composein class- CompositeOperator
- Parameters:
- ctx- the context
 
 
- 
 
-