All Implemented Interfaces:
LogicalOperator

public final class NaiveBayesPredictor extends AbstractPredictor
Operator responsible for predicting outcomes based on a Naive Bayes PMML model. The base algorithm used is specified here here, with the following differences:
  1. Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
  2. We use Laplace smoothing in place of the "threshold" parameter.
  3. We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
  4. Calculation is performed in terms of log-likelihood rather than likelihood.
  • Constructor Details

    • NaiveBayesPredictor

      public NaiveBayesPredictor()
      Predicts an output based on a model and a set of training data. Following construction via the default constructor, no additional configuration is required.
  • Method Details

    • getOutput

      public RecordPort getOutput()
      Returns a record port consisting of the input plus predicted values appended. The record has the following fields:
      1. winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property winnerField.
      2. probability_targetValue (optional): The probability of the named targetValue. The prefix "probability_" is the default; this is configurable via the property probabilityPrefix.
        Overrides:
        getOutput in class AbstractPredictor
        Returns:
        a record flow of predicted values and their probabilities.
      3. getWinnerField

        public String getWinnerField()
        Gets the name of the winner field to output. This is "winner" by-default.
        Returns:
        the name of the winner field to output.
      4. setWinnerField

        public void setWinnerField(String winnerField)
        Sets the name of the winner field to output. This is "winner" by-default.
        Parameters:
        winnerField - the name of the winner field to output.
      5. getProbabilityPrefix

        public String getProbabilityPrefix()
        Gets the field name prefix to use for probabilities. This is "probability_" by-default.
        Returns:
        the field name prefix to use for probabilities.
      6. setProbabilityPrefix

        public void setProbabilityPrefix(String probabilityPrefix)
        Sets the field name prefix to use for probabilities. This is "probability_" by-default.
        Parameters:
        probabilityPrefix - the field name prefix to use for probabilities.
      7. setIgnoreMissingValues

        public void setIgnoreMissingValues(boolean ignoreMissingValues)
        Sets whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.
        Parameters:
        ignoreMissingValues - whether to ignore missing values
      8. isIgnoreMissingValues

        public boolean isIgnoreMissingValues()
        Returns whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.
        Returns:
        whether to ignore missing values
      9. getLaplaceCorrector

        public final double getLaplaceCorrector()
        Returns the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.

        NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on NaiveBayesPredictor.

        Returns:
        the Laplace corrector to be used.
      10. setLaplaceCorrector

        public final void setLaplaceCorrector(double laplaceCorrector)
        Sets the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.

        NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on NaiveBayesPredictor.

        Parameters:
        laplaceCorrector - the Laplace corrector to be used.
      11. isAppendProbabilities

        public boolean isAppendProbabilities()
        Returns whether to include probabilities in the prediction. This is true by default.
        Returns:
        whether to include probabilities in the prediction
      12. setAppendProbabilities

        public void setAppendProbabilities(boolean appendProbabilities)
        Sets whether to include probabilities in the prediction. This is true by default.
        Parameters:
        appendProbabilities - whether to include probabilities in the prediction
      13. computeMetadata

        protected void computeMetadata(StreamingMetadataContext ctx)
        Description copied from class: AbstractPredictor
        Default implementation of computeMetadata.
        1. Output type is set to input type plus predictedType
        2. Input data ordering ( if ordered ) is preserved
        3. Input data partitioning ( if partitioned ) is preserved
        Overrides:
        computeMetadata in class AbstractPredictor
        Parameters:
        ctx - the context
      14. predictedType

        protected RecordTokenType predictedType(PMMLModelSpec spec)
        Description copied from class: AbstractPredictor
        Given the model spec, returns the predicted type. This should not include the input type ( the input is automatically prepended to the type that is returned )
        Specified by:
        predictedType in class AbstractPredictor
        Parameters:
        spec - the model metadata
        Returns:
        the predicted type
      15. execute

        protected void execute(PMMLModel pmml, RecordValued input, ScalarSettable[] predictedFields)
        Description copied from class: AbstractPredictor
        Called to perform prediction. Subclasses are expected to loop over the input by calling AbstractPredictor.stepNext(). For each row of input, subclasses should first set the predicted values in the predictedFields array and then invoke AbstractPredictor.pushPrediction(). Subclasses should not invoke pushEndOfData since that is automatically handled by the base class.
        Specified by:
        execute in class AbstractPredictor
        Parameters:
        pmml - The input PMML model
        input - The input data
        predictedFields - An array of fields that reference the predicted field locations. The array positionally corresponds to the type returned by AbstractPredictor.predictedType(PMMLModelSpec).