Module datarush.analytics
Class ReplaceMissingValues
- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.analytics.cleansing.ReplaceMissingValues
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class ReplaceMissingValues extends CompositeOperator implements RecordPipelineOperator
Replace missing values in the input data according to the given replacement specifications. Each specification provides an action to take and specifies the affected fields. Some actions require a first pass through the data to calculated needed column values such as the minimum value, maximum value, mean, or most frequent value. If any of these actions are specified, the data will be read to calculate the required values. The next pass of the data applies the replacements specified utilizing the calculated data.The order of the input data is preserved where possible. However, when using the action to skip records with missing data, records may be reordered. This is due to how the data is partitioned for parallelization.
A PMML model is created that contains statistics about the number of records skipped and the number of field values replaced. This model is similar to the one created by the
SummaryStatistics
operator.
-
-
Constructor Summary
Constructors Constructor Description ReplaceMissingValues()
Defines a replacement with an empty specification.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.RecordPort
getInput()
Gets the record port providing the input data to the operation.PMMLPort
getModel()
Returns a port that will output aPMMLSummaryStatisticsModel
.RecordPort
getOutput()
Gets the record port providing the output from the operation.List<ReplaceSpecification>
getReplaceSpecifications()
Gets the specifications currently configured for the operation.PMMLPort
getStatisticsInput()
Gets the optional model port providing statistics for replace specifications based on column statistics.void
setReplaceSpecifications(List<ReplaceSpecification> specifications)
Sets the replacement specifications to apply to the input data.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Method Detail
-
getInput
public RecordPort getInput()
Gets the record port providing the input data to the operation.- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Returns:
- the input port for the operation
-
getStatisticsInput
public PMMLPort getStatisticsInput()
Gets the optional model port providing statistics for replace specifications based on column statistics. If not connected and some specification depends on statistics, statistics will automatically be calculated as part of the operation.- Returns:
- the statistics port for the operation
-
getOutput
public RecordPort getOutput()
Gets the record port providing the output from the operation. This will be the input data with null values replaced as specified.- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Returns:
- the output port for the operation
-
getModel
public PMMLPort getModel()
Returns a port that will output aPMMLSummaryStatisticsModel
. The model will be populated with the following information:totalFrequency
: total number of rowsinvalidFrequency
: total number of rows for which at least one field with a skip condition was foundmissingFrequency
: total number of rows for which at least one field with a replace condition was foundtestFailureCounts
: per-test failure counts for each condition involving the given field
- Returns:
- a port that will output a
PMMLSummaryStatisticsModel
.
-
getReplaceSpecifications
public List<ReplaceSpecification> getReplaceSpecifications()
Gets the specifications currently configured for the operation.- Returns:
- the replacement specifications being applied to the input data
-
setReplaceSpecifications
public void setReplaceSpecifications(List<ReplaceSpecification> specifications)
Sets the replacement specifications to apply to the input data.- Parameters:
specifications
- the value replacement specifications to apply
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-