Class FilterExistingRows

  • All Implemented Interfaces:
    LogicalOperator

    public final class FilterExistingRows
    extends AbstractRelationalJoin
    Filters records on the left based on the presence of matching records on the right.

    Row selection is controlled by checking whether the key values on the left can be found in the key fields of any record on the right. Records on the left whose keys match those of at least one record on the right are emitted on the output flow. A secondary flow consisting of those records which did not match is also produced; this output is the complement of the primary output with respect to the left input. In terms of relational algebra, this operator simultaneously performs a left semi-join and left anti-join on the two inputs.

    Depending on the value of AbstractRelationalJoin.getUseHashJoinHint() one of two procedures are used which affects the overall graph behavior.

    1. If hash join hint is false, input data will be sorted and hash partitioned by the specified keys (if not already sorted according to upstream metadata). Once sorted and partitioned, data is is them combined in a streaming fashion. Note that in the case that a join condition is specified, this will require buffering on the right-hand-side, increasing memory requirements if the right has a large number records with duplicate keys.
    2. If hash join hint is true, a full copy of the data from the right will be distributed to the cluster and loaded into memory within each node in the cluster. The left side will not be sorted or partitioned. Thus, the right side should always be small.
    • Constructor Detail

      • FilterExistingRows

        public FilterExistingRows()
        Default constructor. Prior to graph compilation, the following property must be set:
      • FilterExistingRows

        public FilterExistingRows​(JoinKey[] joinKeys)
        Performs a filter with the given set of join keys
        Parameters:
        joinKeys - the join keys
      • FilterExistingRows

        public FilterExistingRows​(List<JoinKey> joinKeys)
        Performs a filter with the given set of join keys
        Parameters:
        joinKeys - the join keys