public final class SummaryStatistics extends CompositeOperator implements RecordSinkOperator
DetailLevel
determine the set of metrics that are calculated.DetailLevel
Constructor and Description |
---|
SummaryStatistics()
Discover summary statistics.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
DetailLevel |
getDetailLevel()
Returns the detail level that we use to compute statistics.
|
List<String> |
getIncludedFields()
Gets the fields from the input dataset for which we are collecting statistics.
|
RecordPort |
getInput()
Returns an input port for the input dataset.
|
PMMLPort |
getOutput()
Returns an output port that will produce a
PMMLSummaryStatisticsModel . |
List<BigDecimal> |
getQuantilesToCalculate()
Gets the
quantiles to calculate for each numeric field. |
int |
getRangeCount()
Returns the number of
intervalCounts to
calculate for each numeric field. |
int |
getShowTopHowMany()
Provides a cap on the number of
valueCounts to calculate. |
boolean |
isFewDistinctValuesHint()
Returns a hint as to whether there are expected to be a small number of distinct values.
|
void |
setDetailLevel(DetailLevel detailLevel)
Sets the detail level that we use to compute statistics.
|
void |
setFewDistinctValuesHint(boolean fewDistinctValuesHint)
Sets a hint as to whether there are expected to be a small number of distinct values.
|
void |
setIncludedFields(List<String> includedFields)
Sets the fields from the input dataset for which we are collecting statistics.
|
void |
setQuantilesToCalculate(List<BigDecimal> quantilesToCalculate)
Sets the
quantiles to calculate for each numeric field. |
void |
setRangeCount(int rangeCount)
Sets the number of
intervalCounts to
calculate for each numeric field. |
void |
setShowTopHowMany(int showTopHowMany)
Sets a cap on the number of
valueCounts to calculate. |
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public SummaryStatistics()
singlePass
statistics; configure detailLevel
to provide more or less detail.public RecordPort getInput()
getInput
in interface RecordSinkOperator
getInput
in interface SinkOperator<RecordPort>
public PMMLPort getOutput()
PMMLSummaryStatisticsModel
.PMMLSummaryStatisticsModel
.public DetailLevel getDetailLevel()
DetailLevel.SINGLE_PASS_ONLY
.public void setDetailLevel(DetailLevel detailLevel)
DetailLevel.SINGLE_PASS_ONLY
.detailLevel
- the detail levelpublic int getShowTopHowMany()
valueCounts
to calculate. The default is 25.
Memory usage is proportional to the number of distinct values; thus only
the top n values are calculated in order to avoid excessive memory consumption in the event
that the number of distinct values for a given field is large. This setting is ignored
if detail level is not DetailLevel.MULTI_PASS
.valueCounts
to calculate.public void setShowTopHowMany(int showTopHowMany)
valueCounts
to calculate. The default is 25.
Memory usage is proportional to the number of distinct values; thus only
the top n values are calculated in order to avoid excessive memory consumption in the event
that the number of distinct values for a given field is large. This setting is ignored
if detail level is not DetailLevel.MULTI_PASS
.showTopHowMany
- the cap the number of valueCounts
to calculate.public int getRangeCount()
intervalCounts
to
calculate for each numeric field. The default value is 10. This setting is ignored
if detail level is not DetailLevel.MULTI_PASS
.intervalCounts
to
calculate for each numeric field.public void setRangeCount(int rangeCount)
intervalCounts
to
calculate for each numeric field. The default value is 10. This setting is ignored
if detail level is not DetailLevel.MULTI_PASS
.rangeCount
- the number of intervalCounts
to
calculate for each numeric field.public List<BigDecimal> getQuantilesToCalculate()
quantiles
to calculate for each numeric field.
By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles).
This setting is ignored if detail level is not DetailLevel.MULTI_PASS
.quantiles
to calculate for each numeric field.public void setQuantilesToCalculate(List<BigDecimal> quantilesToCalculate)
quantiles
to calculate for each numeric field.
By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles).
This setting is ignored if detail level is not DetailLevel.MULTI_PASS
.quantilesToCalculate
- the quantiles
to calculate for each numeric field.public List<String> getIncludedFields()
public void setIncludedFields(List<String> includedFields)
includedFields
- the fields from the input dataset for which we are collecting statistics.public boolean isFewDistinctValuesHint()
DetailLevel.MULTI_PASS
.public void setFewDistinctValuesHint(boolean fewDistinctValuesHint)
DetailLevel.MULTI_PASS
.fewDistinctValuesHint
- whether few distinct values are expectedprotected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2021 Actian Corporation. All rights reserved.