Module datarush.analytics
Class NaiveBayesPredictor
java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.StreamingOperator
com.pervasive.datarush.operators.ExecutableOperator
com.pervasive.datarush.analytics.util.AbstractPredictor
com.pervasive.datarush.analytics.naivebayes.predictor.NaiveBayesPredictor
- All Implemented Interfaces:
LogicalOperator
Operator responsible for predicting outcomes based on a Naive Bayes PMML model.
The base algorithm used is specified here here, with the
following differences:
- Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
- We use
Laplace smoothingin place of the "threshold" parameter. - We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
- Calculation is performed in terms of log-likelihood rather than likelihood.
-
Constructor Summary
ConstructorsConstructorDescriptionPredicts an output based on a model and a set of training data. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidDefault implementation of computeMetadata.protected voidexecute(PMMLModel pmml, RecordValued input, ScalarSettable[] predictedFields) Called to perform prediction.final doubleReturns the Laplace corrector to be used.Returns a record port consisting of the input plus predicted values appended.Gets the field name prefix to use for probabilities.Gets the name of the winner field to output.booleanReturns whether to include probabilities in the prediction.booleanReturns whether to ignore missing values.protected RecordTokenTypepredictedType(PMMLModelSpec spec) Given the model spec, returns the predicted type.voidsetAppendProbabilities(boolean appendProbabilities) Sets whether to include probabilities in the prediction.voidsetIgnoreMissingValues(boolean ignoreMissingValues) Sets whether to ignore missing values.final voidsetLaplaceCorrector(double laplaceCorrector) Sets the Laplace corrector to be used.voidsetProbabilityPrefix(String probabilityPrefix) Sets the field name prefix to use for probabilities.voidsetWinnerField(String winnerField) Sets the name of the winner field to output.Methods inherited from class com.pervasive.datarush.analytics.util.AbstractPredictor
execute, getInput, getModel, pushPrediction, stepNextMethods inherited from class com.pervasive.datarush.operators.ExecutableOperator
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutputMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Constructor Details
-
NaiveBayesPredictor
public NaiveBayesPredictor()Predicts an output based on a model and a set of training data. Following construction via the default constructor, no additional configuration is required.
-
-
Method Details
-
getOutput
Returns a record port consisting of the input plus predicted values appended. The record has the following fields:- winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property
winnerField. - probability_targetValue (optional): The probability of the named targetValue.
The prefix "probability_" is the default; this is configurable via the property
probabilityPrefix.
- Overrides:
getOutputin classAbstractPredictor- Returns:
- a record flow of predicted values and their probabilities.
- winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property
-
getWinnerField
Gets the name of the winner field to output. This is "winner" by-default.- Returns:
- the name of the winner field to output.
-
setWinnerField
Sets the name of the winner field to output. This is "winner" by-default.- Parameters:
winnerField- the name of the winner field to output.
-
getProbabilityPrefix
Gets the field name prefix to use for probabilities. This is "probability_" by-default.- Returns:
- the field name prefix to use for probabilities.
-
setProbabilityPrefix
Sets the field name prefix to use for probabilities. This is "probability_" by-default.- Parameters:
probabilityPrefix- the field name prefix to use for probabilities.
-
setIgnoreMissingValues
public void setIgnoreMissingValues(boolean ignoreMissingValues) Sets whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.- Parameters:
ignoreMissingValues- whether to ignore missing values
-
isIgnoreMissingValues
public boolean isIgnoreMissingValues()Returns whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.- Returns:
- whether to ignore missing values
-
getLaplaceCorrector
public final double getLaplaceCorrector()Returns the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on
NaiveBayesPredictor.- Returns:
- the Laplace corrector to be used.
-
setLaplaceCorrector
public final void setLaplaceCorrector(double laplaceCorrector) Sets the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on
NaiveBayesPredictor.- Parameters:
laplaceCorrector- the Laplace corrector to be used.
-
isAppendProbabilities
public boolean isAppendProbabilities()Returns whether to include probabilities in the prediction. This is true by default.- Returns:
- whether to include probabilities in the prediction
-
setAppendProbabilities
public void setAppendProbabilities(boolean appendProbabilities) Sets whether to include probabilities in the prediction. This is true by default.- Parameters:
appendProbabilities- whether to include probabilities in the prediction
-
computeMetadata
Description copied from class:AbstractPredictorDefault implementation of computeMetadata.- Output type is set to input type plus
predictedType - Input data ordering ( if ordered ) is preserved
- Input data partitioning ( if partitioned ) is preserved
- Overrides:
computeMetadatain classAbstractPredictor- Parameters:
ctx- the context
- Output type is set to input type plus
-
predictedType
Description copied from class:AbstractPredictorGiven the model spec, returns the predicted type. This should not include the input type ( the input is automatically prepended to the type that is returned )- Specified by:
predictedTypein classAbstractPredictor- Parameters:
spec- the model metadata- Returns:
- the predicted type
-
execute
Description copied from class:AbstractPredictorCalled to perform prediction. Subclasses are expected to loop over the input by callingAbstractPredictor.stepNext(). For each row of input, subclasses should first set the predicted values in thepredictedFieldsarray and then invokeAbstractPredictor.pushPrediction(). Subclasses should not invokepushEndOfDatasince that is automatically handled by the base class.- Specified by:
executein classAbstractPredictor- Parameters:
pmml- The input PMML modelinput- The input datapredictedFields- An array of fields that reference the predicted field locations. The array positionally corresponds to the type returned byAbstractPredictor.predictedType(PMMLModelSpec).
-