public class LinearRegressionLearner extends IterativeOperator
A dependent variable must be specified. This is a field in the input that is the target of the linear regression model. One or more independent variables are required from the input data.
This operator supports numeric as well as categorical data as input. The linear regression is performed using an Ordinary Least Squares (OLS) fit. Dummy Coding is used to handle categorical variables.
This approach requires for each of the categorical variables one value from its domain to be chosen that serves as reference for all other values in that domain during the computation of the model. Specifying reference values using operator's API is optional. If for a certain categorical variable no reference value is specified by the user, it will be randomly chosen.
The output is an estimate of coefficients for the model:
Y = a + (b1*x1 + ... + bn*xn) + (0*w1ref + c1,1*w1,1+ ... + c1,k1*w1,k1 + ... + 0*wmref + cm,1*wm,1+ ... + cm,km*wm,km)
where
The following assumptions are made about the nature of input data:
Modifier and Type | Field and Description |
---|---|
protected static int |
MAX_DOMAIN_SIZE |
protected static int |
MIN_DOMAIN_SIZE |
Constructor and Description |
---|
LinearRegressionLearner()
Default constructor.
|
LinearRegressionLearner(String dependentVariable,
String... independentVariables)
Constructor specifying the dependent variable and independent variables.
|
Modifier and Type | Method and Description |
---|---|
protected void |
computeMetadata(IterativeMetadataContext context)
Implementations must adhere to the following contracts
|
protected CompositionIterator |
createIterator(MetadataContext context)
Invoked at the start of execution.
|
String |
getDependentVariable()
Get the field name of the dependent variable.
|
String[] |
getIndependentVariables()
Get the field names of the independent variables.
|
RecordPort |
getInput()
Get the input port of this operator.
|
PMMLPort |
getOutput()
Get the output port of this operator.
|
Map<String,String> |
getReferenceValues()
Get the reference values for the independent categorical variables as
they were set using the corresponding setter method.
|
Double |
getSingularityThreshold()
Get singularityThreshold value
|
void |
setDependentVariable(String dependentVariable)
Set the field name of the dependent variable.
|
void |
setIndependentVariables(String... independentVariables)
Set the field names of the independent variables.
|
void |
setReferenceValues(Map<String,String> referenceValues)
Set reference values for the independent categorical variables.
|
void |
setSingularityThreshold(Double singularityThresholdValue)
Set singularityThreshold value against which a matrix is
considered singular or non singular.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
protected static final int MAX_DOMAIN_SIZE
protected static final int MIN_DOMAIN_SIZE
public LinearRegressionLearner()
setDependentVariable(String)
and
setIndependentVariables(String...)
to set the dependent and
independent variables.public LinearRegressionLearner(String dependentVariable, String... independentVariables)
dependentVariable
- name of the dependent variable fieldindependentVariables
- names of the independent variable fieldspublic String getDependentVariable()
public void setDependentVariable(String dependentVariable)
dependentVariable
- dependent variable field namepublic String[] getIndependentVariables()
public void setIndependentVariables(String... independentVariables)
independentVariables
- independent variable field namespublic void setReferenceValues(Map<String,String> referenceValues)
referenceValues
- mapping from independent categorical variable names to their
reference valuespublic Map<String,String> getReferenceValues()
public void setSingularityThreshold(Double singularityThresholdValue)
singularityThresholdValue
- Default bound to determine effective singularity in LU decompositionpublic Double getSingularityThreshold()
singularityThreshold
- public RecordPort getInput()
public PMMLPort getOutput()
protected void computeMetadata(IterativeMetadataContext context)
IterativeOperator
IterativeMetadataContext.parallelize(ParallelismStrategy)
.
IterativeMetadataContext.setOutputParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean)
IterativeMetadataContext.setIterationParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean)
.MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords
RecordPort#setRequiredDataOrdering
, otherwise iteration will proceed on an input dataset whose order is undefined.
RecordPort#setRequiredDataDistribution
, otherwise iteration will proceed on an input dataset whose distribution is the unspecified partial distribution
.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean)
. In the case that metadata
is dynamic, calls to RecordPort#setType
, RecordPort#setOutputDataOrdering
,
etc are not allowed and thus the sections above entitled "Output record ports (static metadata)"
and "Output model ports (static metadata)" must be skipped. Note that, if possible,
dynamic metadata should be avoided (see IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean)
).
computeMetadata
in class IterativeOperator
context
- the contextprotected CompositionIterator createIterator(MetadataContext context)
IterativeOperator
createIterator
in class IterativeOperator
context
- a context in which the iterative operator can find input port metadata, etc.
this information was available in the previous call to IterativeOperator.computeMetadata(IterativeMetadataContext)
,
but is available here as well so that the iterative operator need not cache any
metadata in its instance variables.Copyright © 2020 Actian Corporation. All rights reserved.