Class NaiveBayesLearner

  • All Implemented Interfaces:
    LogicalOperator

    public final class NaiveBayesLearner
    extends CompositeOperator
    Operator responsible for building a Naive Bayes PMML model from input data. The base algorithm used is specified here here, with the following differences:
    1. Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
    2. We use Laplace smoothing in place of the "threshold" parameter.
    3. We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
    4. Calculation is performed in terms of log-likelihood rather than likelihood.
    • Constructor Detail

      • NaiveBayesLearner

        public NaiveBayesLearner()
        The default constructor. Prior to graph compilation the following required properties must be specified or an exception will be raised:
      • NaiveBayesLearner

        public NaiveBayesLearner​(String targetColumn)
        Creates a new instance of NaiveBayesLearner, specifying the minimal set of required parameters.
        Parameters:
        targetColumn - the target column to predict. Must be of type StringValued.
    • Method Detail

      • getInput

        public RecordPort getInput()
        The input data. String fields are assumed to be categorical. Double fields are assumed to be numerical. All other fields are ignored.
        Returns:
        the input data
      • getModel

        public PMMLPort getModel()
        Returns the output PMML model port.
        Returns:
        the output PMML model port.
      • getLearningColumns

        public final List<String> getLearningColumns()
        Returns the list of columns to be used to predict the output value. Default of empty list means "everything but targetColumn".
        Returns:
        The list of columns to be used to predict the output value.
      • setLearningColumns

        public final void setLearningColumns​(List<String> learningColumns)
        Sets the list of columns to be used to predict the output value. Default of empty list means "everything but targetColumn".
        Parameters:
        learningColumns - The list of columns to be used to predict the output value.
      • setTargetColumn

        public void setTargetColumn​(String targetColumn)
        Sets the column to be predicted. Must be of type string
        Parameters:
        targetColumn - the column to be predicted
      • getTargetColumn

        public String getTargetColumn()
        Gets the column to be predicted. Must be of type string.
        Returns:
        the column to be predicted
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context