Class MostFrequentValues

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class MostFrequentValues
    extends AbstractRecordCompositeOperator
    Compute the most frequent values within the given fields. A maximum should be specified to indicate the top most frequent values that will be output for each selected field.

    The output of this operator is the set of frequent items. Two fields are output for each selected field, the value field from the input and the frequency count of the top values.

    • Constructor Detail

      • MostFrequentValues

        public MostFrequentValues()
    • Method Detail

      • getFieldNames

        public List<String> getFieldNames()
        Gets the names of fields to which the operation is applied.
        Returns:
        the fields which will have the most frequent values discovered
      • setFieldNames

        public void setFieldNames​(List<String> fields)
        Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.
        Parameters:
        fields - the fields which will have the most frequent values discovered
        See Also:
        setFieldNames(String...)
      • setFieldNames

        public void setFieldNames​(String... fields)
        Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.
        Parameters:
        fields - the fields which will have the most frequent values discovered
        See Also:
        setFieldNames(List)
      • getShowTopHowMany

        public int getShowTopHowMany()
        Provides a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large.
        Returns:
        the cap the number of values to calculate.
      • setShowTopHowMany

        public void setShowTopHowMany​(int showTopHowMany)
        Sets a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values.
        Parameters:
        showTopHowMany - the cap the number of values to calculate.
      • isFewDistinctValuesHint

        public boolean isFewDistinctValuesHint()
        Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.
        Returns:
        whether few distinct values are expected
      • setFewDistinctValuesHint

        public void setFewDistinctValuesHint​(boolean fewDistinctValuesHint)
        Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.
        Parameters:
        fewDistinctValuesHint - whether few distinct values are expected
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context