Class MostFrequentValues

All Implemented Interfaces:
LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public class MostFrequentValues extends AbstractRecordCompositeOperator
Compute the most frequent values within the given fields. A maximum should be specified to indicate the top most frequent values that will be output for each selected field.

The output of this operator is the set of frequent items. Two fields are output for each selected field, the value field from the input and the frequency count of the top values.

  • Constructor Details

    • MostFrequentValues

      public MostFrequentValues()
  • Method Details

    • getFieldNames

      public List<String> getFieldNames()
      Gets the names of fields to which the operation is applied.
      Returns:
      the fields which will have the most frequent values discovered
    • setFieldNames

      public void setFieldNames(List<String> fields)
      Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.
      Parameters:
      fields - the fields which will have the most frequent values discovered
      See Also:
    • setFieldNames

      public void setFieldNames(String... fields)
      Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.
      Parameters:
      fields - the fields which will have the most frequent values discovered
      See Also:
    • getShowTopHowMany

      public int getShowTopHowMany()
      Provides a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large.
      Returns:
      the cap the number of values to calculate.
    • setShowTopHowMany

      public void setShowTopHowMany(int showTopHowMany)
      Sets a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values.
      Parameters:
      showTopHowMany - the cap the number of values to calculate.
    • isFewDistinctValuesHint

      public boolean isFewDistinctValuesHint()
      Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.
      Returns:
      whether few distinct values are expected
    • setFewDistinctValuesHint

      public void setFewDistinctValuesHint(boolean fewDistinctValuesHint)
      Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.
      Parameters:
      fewDistinctValuesHint - whether few distinct values are expected
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context