Class ReplaceMissingValues

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class ReplaceMissingValues
    extends CompositeOperator
    implements RecordPipelineOperator
    Replace missing values in the input data according to the given replacement specifications. Each specification provides an action to take and specifies the affected fields. Some actions require a first pass through the data to calculated needed column values such as the minimum value, maximum value, mean, or most frequent value. If any of these actions are specified, the data will be read to calculate the required values. The next pass of the data applies the replacements specified utilizing the calculated data.

    The order of the input data is preserved where possible. However, when using the action to skip records with missing data, records may be reordered. This is due to how the data is partitioned for parallelization.

    A PMML model is created that contains statistics about the number of records skipped and the number of field values replaced. This model is similar to the one created by the SummaryStatistics operator.

    • Constructor Detail

      • ReplaceMissingValues

        public ReplaceMissingValues()
        Defines a replacement with an empty specification. That is, no missing input values are replaced.
    • Method Detail

      • getStatisticsInput

        public PMMLPort getStatisticsInput()
        Gets the optional model port providing statistics for replace specifications based on column statistics. If not connected and some specification depends on statistics, statistics will automatically be calculated as part of the operation.
        Returns:
        the statistics port for the operation
      • getOutput

        public RecordPort getOutput()
        Gets the record port providing the output from the operation. This will be the input data with null values replaced as specified.
        Specified by:
        getOutput in interface PipelineOperator<RecordPort>
        Returns:
        the output port for the operation
      • getReplaceSpecifications

        public List<ReplaceSpecification> getReplaceSpecifications()
        Gets the specifications currently configured for the operation.
        Returns:
        the replacement specifications being applied to the input data
      • setReplaceSpecifications

        public void setReplaceSpecifications​(List<ReplaceSpecification> specifications)
        Sets the replacement specifications to apply to the input data.
        Parameters:
        specifications - the value replacement specifications to apply
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context