- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.analytics.stats.SummaryStatistics
-
- All Implemented Interfaces:
LogicalOperator
,RecordSinkOperator
,SinkOperator<RecordPort>
public final class SummaryStatistics extends CompositeOperator implements RecordSinkOperator
Discovers various metrics of an input dataset, based on the configured detail level. The types of the fields, combined with theDetailLevel
determine the set of metrics that are calculated.- See Also:
DetailLevel
-
-
Constructor Summary
Constructors Constructor Description SummaryStatistics()
Discover summary statistics.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.DetailLevel
getDetailLevel()
Returns the detail level that we use to compute statistics.List<String>
getIncludedFields()
Gets the fields from the input dataset for which we are collecting statistics.RecordPort
getInput()
Returns an input port for the input dataset.PMMLPort
getOutput()
Returns an output port that will produce aPMMLSummaryStatisticsModel
.List<BigDecimal>
getQuantilesToCalculate()
Gets thequantiles
to calculate for each numeric field.int
getRangeCount()
Returns the number ofintervalCounts
to calculate for each numeric field.int
getShowTopHowMany()
Provides a cap on the number ofvalueCounts
to calculate.boolean
isFewDistinctValuesHint()
Returns a hint as to whether there are expected to be a small number of distinct values.void
setDetailLevel(DetailLevel detailLevel)
Sets the detail level that we use to compute statistics.void
setFewDistinctValuesHint(boolean fewDistinctValuesHint)
Sets a hint as to whether there are expected to be a small number of distinct values.void
setIncludedFields(List<String> includedFields)
Sets the fields from the input dataset for which we are collecting statistics.void
setQuantilesToCalculate(List<BigDecimal> quantilesToCalculate)
Sets thequantiles
to calculate for each numeric field.void
setRangeCount(int rangeCount)
Sets the number ofintervalCounts
to calculate for each numeric field.void
setShowTopHowMany(int showTopHowMany)
Sets a cap on the number ofvalueCounts
to calculate.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
SummaryStatistics
public SummaryStatistics()
Discover summary statistics. By default we discoversinglePass
statistics; configuredetailLevel
to provide more or less detail.
-
-
Method Detail
-
getInput
public RecordPort getInput()
Returns an input port for the input dataset. This dataset is used to build the summary model.- Specified by:
getInput
in interfaceRecordSinkOperator
- Specified by:
getInput
in interfaceSinkOperator<RecordPort>
- Returns:
- an input port for the input dataset
-
getOutput
public PMMLPort getOutput()
Returns an output port that will produce aPMMLSummaryStatisticsModel
.- Returns:
- an output port that will produce a
PMMLSummaryStatisticsModel
.
-
getDetailLevel
public DetailLevel getDetailLevel()
Returns the detail level that we use to compute statistics. The default value isDetailLevel.SINGLE_PASS_ONLY
.- Returns:
- the detail level
-
setDetailLevel
public void setDetailLevel(DetailLevel detailLevel)
Sets the detail level that we use to compute statistics. The default value isDetailLevel.SINGLE_PASS_ONLY
.- Parameters:
detailLevel
- the detail level
-
getShowTopHowMany
public int getShowTopHowMany()
Provides a cap on the number ofvalueCounts
to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Returns:
- the cap the number of
valueCounts
to calculate.
-
setShowTopHowMany
public void setShowTopHowMany(int showTopHowMany)
Sets a cap on the number ofvalueCounts
to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Parameters:
showTopHowMany
- the cap the number ofvalueCounts
to calculate.
-
getRangeCount
public int getRangeCount()
Returns the number ofintervalCounts
to calculate for each numeric field. The default value is 10. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Returns:
- the number of
intervalCounts
to calculate for each numeric field.
-
setRangeCount
public void setRangeCount(int rangeCount)
Sets the number ofintervalCounts
to calculate for each numeric field. The default value is 10. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Parameters:
rangeCount
- the number ofintervalCounts
to calculate for each numeric field.
-
getQuantilesToCalculate
public List<BigDecimal> getQuantilesToCalculate()
Gets thequantiles
to calculate for each numeric field. By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles). This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Returns:
- the
quantiles
to calculate for each numeric field.
-
setQuantilesToCalculate
public void setQuantilesToCalculate(List<BigDecimal> quantilesToCalculate)
Sets thequantiles
to calculate for each numeric field. By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles). This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Parameters:
quantilesToCalculate
- thequantiles
to calculate for each numeric field.
-
getIncludedFields
public List<String> getIncludedFields()
Gets the fields from the input dataset for which we are collecting statistics. The default value of "empty list" implies "all fields".- Returns:
- the fields from the input dataset for which we are collecting statistics.
-
setIncludedFields
public void setIncludedFields(List<String> includedFields)
Sets the fields from the input dataset for which we are collecting statistics. The default value of "empty list" implies "all fields".- Parameters:
includedFields
- the fields from the input dataset for which we are collecting statistics.
-
isFewDistinctValuesHint
public boolean isFewDistinctValuesHint()
Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of quantiles and frequent items. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Returns:
- whether few distinct values are expected
-
setFewDistinctValuesHint
public void setFewDistinctValuesHint(boolean fewDistinctValuesHint)
Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of quantiles and frequent items. This setting is ignored if detail level is notDetailLevel.MULTI_PASS
.- Parameters:
fewDistinctValuesHint
- whether few distinct values are expected
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-