All Implemented Interfaces:
LogicalOperator

@Deprecated public final class SemiJoin extends AbstractRelationalJoin
Deprecated.
this operator has been replaced with FilterExistingRows; use that operator instead, linking to the appropriate output port.
Performs a relational semi-join or anti-join on two input datasets by a specified set of keys. Depending on the value of AbstractRelationalJoin.getUseHashJoinHint() one of two procedures are used.
  1. If hash join hint is false, input data will be sorted and hash partitioned by the specified keys (if not already sorted according to upstream metadata). Once sorted and partitioned, data is is them combined in a streaming fashion. Note that in the case that a join condition is specified, this will require buffering on the right-hand-side, increasing memory requirements if the right has a large number records with duplicate keys.
  2. If hash join hint is true, a full copy of the data from the right will be distributed to the cluster and loaded into memory within each node in the cluster. The left side will not be sorted or partitioned. Thus, the right side should always be small.
  • Constructor Details

    • SemiJoin

      public SemiJoin()
      Deprecated.
      Default constructor. Prior to graph compilation, the following property must be set:
    • SemiJoin

      public SemiJoin(JoinKey[] joinKeys)
      Deprecated.
      Performs a semi-join with the given set of join keys
      Parameters:
      joinKeys - the join keys
    • SemiJoin

      public SemiJoin(List<JoinKey> joinKeys)
      Deprecated.
      Performs a semi-join with the given set of join keys
      Parameters:
      joinKeys - the join keys
  • Method Details