Class RemoveDuplicates

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class RemoveDuplicates
    extends CompositeOperator
    implements RecordPipelineOperator
    Removes duplicate rows based on a specified set of group keys. The "first" record of a key value group is pushed to the output. Other records with the same key values are ignored. The "first" record of a key group is determined by sorting all rows of each key group by the specified sortKeys. If sortKeys is unspecified, then this will output an arbitrary row.
    • Constructor Detail

      • RemoveDuplicates

        public RemoveDuplicates()
        Default constructor. Prior to graph compilation the following properties must be set:
      • RemoveDuplicates

        public RemoveDuplicates​(List<String> groupKeys)
        Remove duplicates, specifying keys.
        Parameters:
        groupKeys - the names of the key fields. Must not be empty or null.
    • Method Detail

      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context
      • getGroupKeys

        public String[] getGroupKeys()
        Returns the keys by which to de-duplicate.
        Returns:
        the keys by which to de-duplicate.
      • setGroupKeys

        public void setGroupKeys​(String[] groupKeys)
        Sets the keys by which to de-duplicate.
        Parameters:
        groupKeys - the keys by which to de-duplicate.
      • getSortKeys

        public SortKey[] getSortKeys()
        Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.
        Returns:
        the additional keys by which to sort the data
      • setSortKeys

        public void setSortKeys​(SortKey[] sortKeys)
        Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.
        Parameters:
        sortKeys - the additional keys by which to sort the data
      • setSortKeys

        public void setSortKeys​(String... keys)
        Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.
        Parameters:
        keys - the additional keys by which to sort the data