- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.AbstractRecordCompositeOperator
-
- com.pervasive.datarush.analytics.stats.MostFrequentValues
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class MostFrequentValues extends AbstractRecordCompositeOperator
Compute the most frequent values within the given fields. A maximum should be specified to indicate the top most frequent values that will be output for each selected field.The output of this operator is the set of frequent items. Two fields are output for each selected field, the value field from the input and the frequency count of the top values.
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
input, output
-
-
Constructor Summary
Constructors Constructor Description MostFrequentValues()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.List<String>
getFieldNames()
Gets the names of fields to which the operation is applied.int
getShowTopHowMany()
Provides a cap on the number of value frequencies to calculate.boolean
isFewDistinctValuesHint()
Returns a hint as to whether there are expected to be a small number of distinct values.void
setFewDistinctValuesHint(boolean fewDistinctValuesHint)
Sets a hint as to whether there are expected to be a small number of distinct values.void
setFieldNames(String... fields)
Sets the names of fields to which the operation is applied.void
setFieldNames(List<String> fields)
Sets the names of fields to which the operation is applied.void
setShowTopHowMany(int showTopHowMany)
Sets a cap on the number of value frequencies to calculate.-
Methods inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
getInput, getOutput
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Method Detail
-
getFieldNames
public List<String> getFieldNames()
Gets the names of fields to which the operation is applied.- Returns:
- the fields which will have the most frequent values discovered
-
setFieldNames
public void setFieldNames(List<String> fields)
Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.- Parameters:
fields
- the fields which will have the most frequent values discovered- See Also:
setFieldNames(String...)
-
setFieldNames
public void setFieldNames(String... fields)
Sets the names of fields to which the operation is applied. If no fields are specified all fields are selected by default.- Parameters:
fields
- the fields which will have the most frequent values discovered- See Also:
setFieldNames(List)
-
getShowTopHowMany
public int getShowTopHowMany()
Provides a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large.- Returns:
- the cap the number of values to calculate.
-
setShowTopHowMany
public void setShowTopHowMany(int showTopHowMany)
Sets a cap on the number of value frequencies to calculate. The default is 25. Memory usage is proportional to the number of distinct values.- Parameters:
showTopHowMany
- the cap the number of values to calculate.
-
isFewDistinctValuesHint
public boolean isFewDistinctValuesHint()
Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.- Returns:
- whether few distinct values are expected
-
setFewDistinctValuesHint
public void setFewDistinctValuesHint(boolean fewDistinctValuesHint)
Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of frequent items.- Parameters:
fewDistinctValuesHint
- whether few distinct values are expected
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-