- All Implemented Interfaces:
LogicalOperator
The first step in a matching operation is to index the input data records into groups for processing by the configured phases of field comparisons, classifiers and filter. This indexing is useful in potentially reducing the number of records that must be compared. The output of this step in the matching operation is a stream of record pairs that must be compared, classified and filtered.
Record pair comparisons happen in configured phases. A matching operation may consist of a single phase. Each phase consists of a set of field comparisons, classifiers and a filter. Field comparisons compare a field from each source using a fuzzy matching comparison operator. Each comparison outputs a field comparison score. A classifier may be used to classify or aggregate multiple field scores into a single score. A classifier outputs a single value representing the composite score. A phase may utilize zero to many classifiers and a classifier can be used to aggregate scores from many classifiers. A filter is the last step of a phase. The filter ensures that record pairs are pushed to the output stream only if they meet the filter criteria. The output of this matching operation is a stream of record pairs that are deemed to be likely matches. Each record pair will contain a record score that determines the strength of the match on the spectrum from zero to one. A score approaching 0 is an unlikely match. A score approaching 1 is a very likely match.
-
Constructor Summary
ConstructorsConstructorDescriptionDiscover linkages.DiscoverLinks(Index index, List<Phase> phases) Discover linkages between input using multiple phases of comparison, classifying and filtering. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.getIndex()Gets the pair generation method for determining initial candidate matches.getLeft()Gets the record port providing the left-hand input to the operation.Gets the record port providing the output from the operation.Gets the phases of comparison, classifying and filtering used to determine matches.getRight()Gets the record port providing the right-hand input to the operation.voidSets the pair generation method for determining initial candidate matches.voidSets the phases of comparison, classifying and filtering used to determine matches.Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Constructor Details
-
DiscoverLinks
public DiscoverLinks()Discover linkages. Both an indexing method and processing phases must be specified.- See Also:
-
DiscoverLinks
Discover linkages between input using multiple phases of comparison, classifying and filtering.- Parameters:
index- properties for the blocking operationphases- configuration of phases to execute
-
-
Method Details
-
getLeft
Gets the record port providing the left-hand input to the operation.- Returns:
- the left input port for the operation
-
getRight
Gets the record port providing the right-hand input to the operation.- Returns:
- the right input port for the operation
-
setPhases
Sets the phases of comparison, classifying and filtering used to determine matches.- Parameters:
phases- definition of phases for field comparisons
-
getOutput
Gets the record port providing the output from the operation.- Returns:
- the output port for the operation
-
setIndex
Sets the pair generation method for determining initial candidate matches.- Parameters:
index- properties used to index the input data
-
getIndex
Gets the pair generation method for determining initial candidate matches.- Returns:
- properties used to index the input data
-
getPhases
Gets the phases of comparison, classifying and filtering used to determine matches.- Returns:
- definition of phases for field comparisons
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-