Class SemiJoin

  • All Implemented Interfaces:
    LogicalOperator

    @Deprecated
    public final class SemiJoin
    extends AbstractRelationalJoin
    Deprecated.
    this operator has been replaced with FilterExistingRows; use that operator instead, linking to the appropriate output port.
    Performs a relational semi-join or anti-join on two input datasets by a specified set of keys. Depending on the value of AbstractRelationalJoin.getUseHashJoinHint() one of two procedures are used.
    1. If hash join hint is false, input data will be sorted and hash partitioned by the specified keys (if not already sorted according to upstream metadata). Once sorted and partitioned, data is is them combined in a streaming fashion. Note that in the case that a join condition is specified, this will require buffering on the right-hand-side, increasing memory requirements if the right has a large number records with duplicate keys.
    2. If hash join hint is true, a full copy of the data from the right will be distributed to the cluster and loaded into memory within each node in the cluster. The left side will not be sorted or partitioned. Thus, the right side should always be small.
    • Constructor Detail

      • SemiJoin

        public SemiJoin()
        Deprecated.
        Default constructor. Prior to graph compilation, the following property must be set:
      • SemiJoin

        public SemiJoin​(JoinKey[] joinKeys)
        Deprecated.
        Performs a semi-join with the given set of join keys
        Parameters:
        joinKeys - the join keys
      • SemiJoin

        public SemiJoin​(List<JoinKey> joinKeys)
        Deprecated.
        Performs a semi-join with the given set of join keys
        Parameters:
        joinKeys - the join keys