Class NaiveBayesPredictor

  • All Implemented Interfaces:
    LogicalOperator

    public final class NaiveBayesPredictor
    extends AbstractPredictor
    Operator responsible for predicting outcomes based on a Naive Bayes PMML model. The base algorithm used is specified here here, with the following differences:
    1. Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
    2. We use Laplace smoothing in place of the "threshold" parameter.
    3. We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
    4. Calculation is performed in terms of log-likelihood rather than likelihood.
    • Constructor Detail

      • NaiveBayesPredictor

        public NaiveBayesPredictor()
        Predicts an output based on a model and a set of training data. Following construction via the default constructor, no additional configuration is required.
    • Method Detail

      • getOutput

        public RecordPort getOutput()
        Returns a record port consisting of the input plus predicted values appended. The record has the following fields:
        1. winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property winnerField.
        2. probability_targetValue (optional): The probability of the named targetValue. The prefix "probability_" is the default; this is configurable via the property probabilityPrefix.
          Overrides:
          getOutput in class AbstractPredictor
          Returns:
          a record flow of predicted values and their probabilities.
        • getWinnerField

          public String getWinnerField()
          Gets the name of the winner field to output. This is "winner" by-default.
          Returns:
          the name of the winner field to output.
        • setWinnerField

          public void setWinnerField​(String winnerField)
          Sets the name of the winner field to output. This is "winner" by-default.
          Parameters:
          winnerField - the name of the winner field to output.
        • getProbabilityPrefix

          public String getProbabilityPrefix()
          Gets the field name prefix to use for probabilities. This is "probability_" by-default.
          Returns:
          the field name prefix to use for probabilities.
        • setProbabilityPrefix

          public void setProbabilityPrefix​(String probabilityPrefix)
          Sets the field name prefix to use for probabilities. This is "probability_" by-default.
          Parameters:
          probabilityPrefix - the field name prefix to use for probabilities.
        • setIgnoreMissingValues

          public void setIgnoreMissingValues​(boolean ignoreMissingValues)
          Sets whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.
          Parameters:
          ignoreMissingValues - whether to ignore missing values
        • isIgnoreMissingValues

          public boolean isIgnoreMissingValues()
          Returns whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.
          Returns:
          whether to ignore missing values
        • getLaplaceCorrector

          public final double getLaplaceCorrector()
          Returns the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.

          NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on NaiveBayesPredictor.

          Returns:
          the Laplace corrector to be used.
        • setLaplaceCorrector

          public final void setLaplaceCorrector​(double laplaceCorrector)
          Sets the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.

          NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on NaiveBayesPredictor.

          Parameters:
          laplaceCorrector - the Laplace corrector to be used.
        • isAppendProbabilities

          public boolean isAppendProbabilities()
          Returns whether to include probabilities in the prediction. This is true by default.
          Returns:
          whether to include probabilities in the prediction
        • setAppendProbabilities

          public void setAppendProbabilities​(boolean appendProbabilities)
          Sets whether to include probabilities in the prediction. This is true by default.
          Parameters:
          appendProbabilities - whether to include probabilities in the prediction
        • predictedType

          protected RecordTokenType predictedType​(PMMLModelSpec spec)
          Description copied from class: AbstractPredictor
          Given the model spec, returns the predicted type. This should not include the input type ( the input is automatically prepended to the type that is returned )
          Specified by:
          predictedType in class AbstractPredictor
          Parameters:
          spec - the model metadata
          Returns:
          the predicted type