All Implemented Interfaces:
LogicalOperator

public final class NaiveBayesLearner extends CompositeOperator
Operator responsible for building a Naive Bayes PMML model from input data. The base algorithm used is specified here here, with the following differences:
  1. Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
  2. We use Laplace smoothing in place of the "threshold" parameter.
  3. We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
  4. Calculation is performed in terms of log-likelihood rather than likelihood.
  • Constructor Details

    • NaiveBayesLearner

      public NaiveBayesLearner()
      The default constructor. Prior to graph compilation the following required properties must be specified or an exception will be raised:
    • NaiveBayesLearner

      public NaiveBayesLearner(String targetColumn)
      Creates a new instance of NaiveBayesLearner, specifying the minimal set of required parameters.
      Parameters:
      targetColumn - the target column to predict. Must be of type StringValued.
  • Method Details

    • getInput

      public RecordPort getInput()
      The input data. String fields are assumed to be categorical. Double fields are assumed to be numerical. All other fields are ignored.
      Returns:
      the input data
    • getModel

      public PMMLPort getModel()
      Returns the output PMML model port.
      Returns:
      the output PMML model port.
    • getLearningColumns

      public final List<String> getLearningColumns()
      Returns the list of columns to be used to predict the output value. Default of empty list means "everything but targetColumn".
      Returns:
      The list of columns to be used to predict the output value.
    • setLearningColumns

      public final void setLearningColumns(List<String> learningColumns)
      Sets the list of columns to be used to predict the output value. Default of empty list means "everything but targetColumn".
      Parameters:
      learningColumns - The list of columns to be used to predict the output value.
    • setTargetColumn

      public void setTargetColumn(String targetColumn)
      Sets the column to be predicted. Must be of type string
      Parameters:
      targetColumn - the column to be predicted
    • getTargetColumn

      public String getTargetColumn()
      Gets the column to be predicted. Must be of type string.
      Returns:
      the column to be predicted
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context