Class AnalyzeDuplicateKeys

  • All Implemented Interfaces:
    LogicalOperator

    public class AnalyzeDuplicateKeys
    extends CompositeOperator
    Provides an analysis of the quality of a set of blocking keys over data to be deduplicated. As each record in a given block must be compared to every other record in the block during deduplication, the smaller the block sizes, the better the performance. This
    • Constructor Detail

      • AnalyzeDuplicateKeys

        public AnalyzeDuplicateKeys()
        Analyzes the data without blocking keys. In this case, the data is one giant block. Set the blocking keys using setBlockingKeys(List)
    • Method Detail

      • getInput

        public RecordPort getInput()
        Gets the record port providing the input data to analyze.
        Returns:
        the input port for the operation
      • getBlockingKeys

        public List<String> getBlockingKeys()
        Gets the fields to use for key blocking.
        Returns:
        the blocking keys
      • setBlockingKeys

        public void setBlockingKeys​(List<String> keys)
        Sets the fields to use for key blocking.
        Parameters:
        keys - the blocking keys
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context