Class AnalyzeDuplicateKeys

All Implemented Interfaces:
LogicalOperator

public class AnalyzeDuplicateKeys extends CompositeOperator
Provides an analysis of the quality of a set of blocking keys over data to be deduplicated. As each record in a given block must be compared to every other record in the block during deduplication, the smaller the block sizes, the better the performance. This
  • Constructor Details

    • AnalyzeDuplicateKeys

      public AnalyzeDuplicateKeys()
      Analyzes the data without blocking keys. In this case, the data is one giant block. Set the blocking keys using setBlockingKeys(List)
  • Method Details

    • getInput

      public RecordPort getInput()
      Gets the record port providing the input data to analyze.
      Returns:
      the input port for the operation
    • getBlockingKeys

      public List<String> getBlockingKeys()
      Gets the fields to use for key blocking.
      Returns:
      the blocking keys
    • setBlockingKeys

      public void setBlockingKeys(List<String> keys)
      Sets the fields to use for key blocking.
      Parameters:
      keys - the blocking keys
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context