Module datarush.analytics
Class NaiveBayesPredictor
- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.StreamingOperator
-
- com.pervasive.datarush.operators.ExecutableOperator
-
- com.pervasive.datarush.analytics.util.AbstractPredictor
-
- com.pervasive.datarush.analytics.naivebayes.predictor.NaiveBayesPredictor
-
- All Implemented Interfaces:
LogicalOperator
public final class NaiveBayesPredictor extends AbstractPredictor
Operator responsible for predicting outcomes based on a Naive Bayes PMML model. The base algorithm used is specified here here, with the following differences:- Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
- We use
Laplace smoothing
in place of the "threshold" parameter. - We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ration of missing to non-missing.
- Calculation is performed in terms of log-likelihood rather than likelihood.
-
-
Constructor Summary
Constructors Constructor Description NaiveBayesPredictor()
Predicts an output based on a model and a set of training data.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
computeMetadata(StreamingMetadataContext ctx)
Default implementation of computeMetadata.protected void
execute(PMMLModel pmml, RecordValued input, ScalarSettable[] predictedFields)
Called to perform prediction.double
getLaplaceCorrector()
Returns the Laplace corrector to be used.RecordPort
getOutput()
Returns a record port consisting of the input plus predicted values appended.String
getProbabilityPrefix()
Gets the field name prefix to use for probabilities.String
getWinnerField()
Gets the name of the winner field to output.boolean
isAppendProbabilities()
Returns whether to include probabilities in the prediction.boolean
isIgnoreMissingValues()
Returns whether to ignore missing values.protected RecordTokenType
predictedType(PMMLModelSpec spec)
Given the model spec, returns the predicted type.void
setAppendProbabilities(boolean appendProbabilities)
Sets whether to include probabilities in the prediction.void
setIgnoreMissingValues(boolean ignoreMissingValues)
Sets whether to ignore missing values.void
setLaplaceCorrector(double laplaceCorrector)
Sets the Laplace corrector to be used.void
setProbabilityPrefix(String probabilityPrefix)
Sets the field name prefix to use for probabilities.void
setWinnerField(String winnerField)
Sets the name of the winner field to output.-
Methods inherited from class com.pervasive.datarush.analytics.util.AbstractPredictor
execute, getInput, getModel, pushPrediction, stepNext
-
Methods inherited from class com.pervasive.datarush.operators.ExecutableOperator
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutput
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
-
-
-
Method Detail
-
getOutput
public RecordPort getOutput()
Returns a record port consisting of the input plus predicted values appended. The record has the following fields:- winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property
winnerField
. - probability_targetValue (optional): The probability of the named targetValue.
The prefix "probability_" is the default; this is configurable via the property
probabilityPrefix
.
- Overrides:
getOutput
in classAbstractPredictor
- Returns:
- a record flow of predicted values and their probabilities.
- winner: The target value of highest probability. The name "winner" is the default; this is configurable via the property
-
getWinnerField
public String getWinnerField()
Gets the name of the winner field to output. This is "winner" by-default.- Returns:
- the name of the winner field to output.
-
setWinnerField
public void setWinnerField(String winnerField)
Sets the name of the winner field to output. This is "winner" by-default.- Parameters:
winnerField
- the name of the winner field to output.
-
getProbabilityPrefix
public String getProbabilityPrefix()
Gets the field name prefix to use for probabilities. This is "probability_" by-default.- Returns:
- the field name prefix to use for probabilities.
-
setProbabilityPrefix
public void setProbabilityPrefix(String probabilityPrefix)
Sets the field name prefix to use for probabilities. This is "probability_" by-default.- Parameters:
probabilityPrefix
- the field name prefix to use for probabilities.
-
setIgnoreMissingValues
public void setIgnoreMissingValues(boolean ignoreMissingValues)
Sets whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.- Parameters:
ignoreMissingValues
- whether to ignore missing values
-
isIgnoreMissingValues
public boolean isIgnoreMissingValues()
Returns whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Defaults to true.- Returns:
- whether to ignore missing values
-
getLaplaceCorrector
public final double getLaplaceCorrector()
Returns the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on
NaiveBayesPredictor
.- Returns:
- the Laplace corrector to be used.
-
setLaplaceCorrector
public final void setLaplaceCorrector(double laplaceCorrector)
Sets the Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction.NOTE: The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on
NaiveBayesPredictor
.- Parameters:
laplaceCorrector
- the Laplace corrector to be used.
-
isAppendProbabilities
public boolean isAppendProbabilities()
Returns whether to include probabilities in the prediction. This is true by default.- Returns:
- whether to include probabilities in the prediction
-
setAppendProbabilities
public void setAppendProbabilities(boolean appendProbabilities)
Sets whether to include probabilities in the prediction. This is true by default.- Parameters:
appendProbabilities
- whether to include probabilities in the prediction
-
computeMetadata
protected void computeMetadata(StreamingMetadataContext ctx)
Description copied from class:AbstractPredictor
Default implementation of computeMetadata.- Output type is set to input type plus
predictedType
- Input data ordering ( if ordered ) is preserved
- Input data partitioning ( if partitioned ) is preserved
- Overrides:
computeMetadata
in classAbstractPredictor
- Parameters:
ctx
- the context
- Output type is set to input type plus
-
predictedType
protected RecordTokenType predictedType(PMMLModelSpec spec)
Description copied from class:AbstractPredictor
Given the model spec, returns the predicted type. This should not include the input type ( the input is automatically prepended to the type that is returned )- Specified by:
predictedType
in classAbstractPredictor
- Parameters:
spec
- the model metadata- Returns:
- the predicted type
-
execute
protected void execute(PMMLModel pmml, RecordValued input, ScalarSettable[] predictedFields)
Description copied from class:AbstractPredictor
Called to perform prediction. Subclasses are expected to loop over the input by callingAbstractPredictor.stepNext()
. For each row of input, subclasses should first set the predicted values in thepredictedFields
array and then invokeAbstractPredictor.pushPrediction()
. Subclasses should not invokepushEndOfData
since that is automatically handled by the base class.- Specified by:
execute
in classAbstractPredictor
- Parameters:
pmml
- The input PMML modelinput
- The input datapredictedFields
- An array of fields that reference the predicted field locations. The array positionally corresponds to the type returned byAbstractPredictor.predictedType(PMMLModelSpec)
.
-
-