Class BlockRecords

All Implemented Interfaces:
LogicalOperator

public class BlockRecords extends CompositeOperator
Block records into groups of like records based on a set of key fields and generate record pairs from these groups. Blocking records in this way allows reduces the number of record pairs generated which leads to many less comparison operations. This can dramatically speed up execution of record matching if the keys are trustworthy.

The output is generated as a set of flows each containing record pairs generated from the input data source. Since the partitioning is based on key values, the output flows may be unbalanced in terms of the number of record pairs contained within each.

  • Constructor Details

  • Method Details

    • getLeftInput

      public RecordPort getLeftInput()
      Gets the record port providing the left hand input to the pair generation.
      Returns:
      the left input port for the operation
    • getRightInput

      public RecordPort getRightInput()
      Gets the record port providing the right hand input to the pair generation.
      Returns:
      the right input port for the operation
    • getOutput

      public RecordPort getOutput()
      Gets the record port providing the results of the pair generation.
      Returns:
      the output port for the operation
    • getLeftFieldPattern

      public String getLeftFieldPattern()
      Gets the output naming pattern for fields from the left hand input.
      Returns:
      the pattern for the left hand side field names in output.
    • setLeftFieldPattern

      public void setLeftFieldPattern(String pattern)
      Sets the output naming pattern for fields from the left hand input. This is used to ensure distinct names in the output pairs.
      Parameters:
      pattern - name pattern for the left hand side field names
    • getRightFieldPattern

      public String getRightFieldPattern()
      Gets the output naming pattern for fields from the right hand input.
      Returns:
      the pattern for the right hand side field names in output.
    • setRightFieldPattern

      public void setRightFieldPattern(String pattern)
      Sets the output naming pattern for fields from the right hand input. This is used to ensure distinct names in the output pairs.
      Parameters:
      pattern - name pattern for the right hand side field names
    • getLeftKeys

      public String[] getLeftKeys()
      Gets the fields used as keys for data on the left hand side.
      Returns:
      the key fields on the left hand side
    • setLeftKeys

      public void setLeftKeys(String[] keys)
      Sets the fields used as keys for data on the left hand side. There must be an equal number of keys specified on the left and right sides. Only record pairs where these keys are equal will be output; key comparison is done by position.
      Parameters:
      keys - the key fields on the left hand side
    • getRightKeys

      public String[] getRightKeys()
      Gets the fields used as keys for data on the right hand side.
      Returns:
      the key fields on the right hand side
    • setRightKeys

      public void setRightKeys(String[] keys)
      Sets the fields used as keys for data on the right hand side. There must be an equal number of keys specified on the left and right sides. Only record pairs where these keys are equal will be output; key comparison is done by position.
      Parameters:
      keys - the key fields on the right hand side
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context