Class DistinctValues

All Implemented Interfaces:
LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public final class DistinctValues extends AbstractRecordCompositeOperator
Calculates distinct values of the given input field. This produces a record consisting of the following fields:
  • inputField: the value from the original dataset
  • count_inputField the number of occurrences of that value
  • Constructor Details

    • DistinctValues

      public DistinctValues()
      Default constructor. Prior to graph compilation the following property must be set:
    • DistinctValues

      public DistinctValues(String inputField)
      Computes distinct values for the given input field
      Parameters:
      inputField - the input field for which we calculate distinct values
  • Method Details

    • getInput

      public RecordPort getInput()
      Description copied from interface: PipelineOperator
      Returns the input port
      Specified by:
      getInput in interface PipelineOperator<RecordPort>
      Overrides:
      getInput in class AbstractRecordCompositeOperator
      Returns:
      the input port
    • getOutput

      public RecordPort getOutput()
      Description copied from interface: PipelineOperator
      Returns the output port
      Specified by:
      getOutput in interface PipelineOperator<RecordPort>
      Overrides:
      getOutput in class AbstractRecordCompositeOperator
      Returns:
      the output port
    • getInputField

      public String getInputField()
      Returns the input field for which we calculate distinct values
      Returns:
      the input field for which we calculate distinct values
    • setInputField

      public void setInputField(String inputField)
      Sets the input field for which we calculate distinct values
      Parameters:
      inputField - the input field for which we calculate distinct values
    • isSortByCount

      public boolean isSortByCount()
      Returns whether to sort by value count. This is false by default and thus output order is unspecified.
      Returns:
      whether to sort by value count
    • setSortByCount

      public void setSortByCount(boolean sortByCount)
      Sets whether to sort by value count. This is false by default and thus output order is unspecified.
      Parameters:
      sortByCount - whether to sort by value count
    • isFewDistinctValuesHint

      public boolean isFewDistinctValuesHint()
      Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort up-front.
      Returns:
      whether few distinct values are expected
    • setFewDistinctValuesHint

      public void setFewDistinctValuesHint(boolean fewDistinctValuesHint)
      Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort up-front.
      Parameters:
      fewDistinctValuesHint - whether few distinct values are expected
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context