Module datarush.analytics
Class ReplaceMissingValues
java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.CompositeOperator
com.pervasive.datarush.analytics.cleansing.ReplaceMissingValues
- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
Replace missing values in the input data according to the given replacement specifications.
Each specification provides an action to take and specifies the affected fields.
Some actions require a first pass through the data to calculated needed column values such
as the minimum value, maximum value, mean, or most frequent value. If any of these actions
are specified, the data will be read to calculate the required values. The next pass of
the data applies the replacements specified utilizing the calculated data.
The order of the input data is preserved where possible. However, when using the action to skip records with missing data, records may be reordered. This is due to how the data is partitioned for parallelization.
A PMML model is created that contains statistics about the number of records skipped and
the number of field values replaced. This model is similar to the one created by the
SummaryStatistics operator.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.getInput()Gets the record port providing the input data to the operation.getModel()Returns a port that will output aPMMLSummaryStatisticsModel.Gets the record port providing the output from the operation.Gets the specifications currently configured for the operation.Gets the optional model port providing statistics for replace specifications based on column statistics.voidsetReplaceSpecifications(List<ReplaceSpecification> specifications) Sets the replacement specifications to apply to the input data.Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Constructor Details
-
ReplaceMissingValues
public ReplaceMissingValues()Defines a replacement with an empty specification. That is, no missing input values are replaced.
-
-
Method Details
-
getInput
Gets the record port providing the input data to the operation.- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Returns:
- the input port for the operation
-
getStatisticsInput
Gets the optional model port providing statistics for replace specifications based on column statistics. If not connected and some specification depends on statistics, statistics will automatically be calculated as part of the operation.- Returns:
- the statistics port for the operation
-
getOutput
Gets the record port providing the output from the operation. This will be the input data with null values replaced as specified.- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Returns:
- the output port for the operation
-
getModel
Returns a port that will output aPMMLSummaryStatisticsModel. The model will be populated with the following information:totalFrequency: total number of rowsinvalidFrequency: total number of rows for which at least one field with a skip condition was foundmissingFrequency: total number of rows for which at least one field with a replace condition was foundtestFailureCounts: per-test failure counts for each condition involving the given field
- Returns:
- a port that will output a
PMMLSummaryStatisticsModel.
-
getReplaceSpecifications
Gets the specifications currently configured for the operation.- Returns:
- the replacement specifications being applied to the input data
-
setReplaceSpecifications
Sets the replacement specifications to apply to the input data.- Parameters:
specifications- the value replacement specifications to apply
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-