Class BlockRecords

  • All Implemented Interfaces:
    LogicalOperator

    public class BlockRecords
    extends CompositeOperator
    Block records into groups of like records based on a set of key fields and generate record pairs from these groups. Blocking records in this way allows reduces the number of record pairs generated which leads to many less comparison operations. This can dramatically speed up execution of record matching if the keys are trustworthy.

    The output is generated as a set of flows each containing record pairs generated from the input data source. Since the partitioning is based on key values, the output flows may be unbalanced in terms of the number of record pairs contained within each.

    • Method Detail

      • getLeftInput

        public RecordPort getLeftInput()
        Gets the record port providing the left hand input to the pair generation.
        Returns:
        the left input port for the operation
      • getRightInput

        public RecordPort getRightInput()
        Gets the record port providing the right hand input to the pair generation.
        Returns:
        the right input port for the operation
      • getOutput

        public RecordPort getOutput()
        Gets the record port providing the results of the pair generation.
        Returns:
        the output port for the operation
      • getLeftFieldPattern

        public String getLeftFieldPattern()
        Gets the output naming pattern for fields from the left hand input.
        Returns:
        the pattern for the left hand side field names in output.
      • setLeftFieldPattern

        public void setLeftFieldPattern​(String pattern)
        Sets the output naming pattern for fields from the left hand input. This is used to ensure distinct names in the output pairs.
        Parameters:
        pattern - name pattern for the left hand side field names
      • getRightFieldPattern

        public String getRightFieldPattern()
        Gets the output naming pattern for fields from the right hand input.
        Returns:
        the pattern for the right hand side field names in output.
      • setRightFieldPattern

        public void setRightFieldPattern​(String pattern)
        Sets the output naming pattern for fields from the right hand input. This is used to ensure distinct names in the output pairs.
        Parameters:
        pattern - name pattern for the right hand side field names
      • getLeftKeys

        public String[] getLeftKeys()
        Gets the fields used as keys for data on the left hand side.
        Returns:
        the key fields on the left hand side
      • setLeftKeys

        public void setLeftKeys​(String[] keys)
        Sets the fields used as keys for data on the left hand side. There must be an equal number of keys specified on the left and right sides. Only record pairs where these keys are equal will be output; key comparison is done by position.
        Parameters:
        keys - the key fields on the left hand side
      • getRightKeys

        public String[] getRightKeys()
        Gets the fields used as keys for data on the right hand side.
        Returns:
        the key fields on the right hand side
      • setRightKeys

        public void setRightKeys​(String[] keys)
        Sets the fields used as keys for data on the right hand side. There must be an equal number of keys specified on the left and right sides. Only record pairs where these keys are equal will be output; key comparison is done by position.
        Parameters:
        keys - the key fields on the right hand side
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context