java.lang.Object
- com.pervasive.datarush.operators.AbstractLogicalOperator
- - com.pervasive.datarush.operators.IterativeOperator
  - - com.pervasive.datarush.analytics.regression.LinearRegressionLearner

All Implemented Interfaces:

LogicalOperator
```
public class LinearRegressionLearner
extends IterativeOperator
```
Performs a multivariate linear regression on the given training data. The output is a PMML model describing the resultant regression model. The model consists of the y-intercept and the coefficients for each of the given independent variables.
A dependent variable must be specified. This is a field in the input that is the target of the linear regression model. One or more independent variables are required from the input data.
This operator supports numeric as well as categorical data as input. The linear regression is performed using an Ordinary Least Squares (OLS) fit. Dummy Coding is used to handle categorical variables.
This approach requires for each of the categorical variables one value from its domain to be chosen that serves as reference for all other values in that domain during the computation of the model. Specifying reference values using operator's API is optional. If for a certain categorical variable no reference value is specified by the user, it will be randomly chosen.
The output is an estimate of coefficients for the model:
Y = a + (b₁*x₁ + ... + b_n*x_n) + (0*w₁_{_ref} + c_1,1*w_1,1+ ... + c_1,k₁*w_1,k₁+ ... + 0*w_m_{_ref} + c_m,1*w_m,1+ ... + c_{m,k_m}*w_{m,k_m})
where
- a is the constant term (aka the intercept)
- n is the number of numeric input variables
- b_i; 0 < i ≤ n, is the coefficient for numerical input variable x_i
- m is the number of categorical input variables
- w_{i_ref}; 0 < i ≤ m, is the reference value of the categorical variable w_i
- k_i;0 < i ≤ m, is the domain size of the categorical variable w_i
- c_i,j; 0 < i ≤ m, 0 < j ≤ k_i, is the coefficient for the jth non-reference value w_i,j of the ith categorical input variable w_i
The following assumptions are made about the nature of input data:
- Independent variables must be linearly independent from each other.
- Dependent variable must be numerical (i.e. continuous and not discrete).
- All variables loosely follow the normal distribution.

Field Summary

Fields
Modifier and Type Field Description

protected static int MAX_DOMAIN_SIZE

protected static int MIN_DOMAIN_SIZE

Constructor Summary

Constructors
Constructor	Description
`LinearRegressionLearner()`	Default constructor.
`LinearRegressionLearner(String dependentVariable, String... independentVariables)`	Constructor specifying the dependent variable and independent variables.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected void`	`computeMetadata(IterativeMetadataContext context)`	Implementations must adhere to the following contracts
`protected CompositionIterator`	`createIterator(MetadataContext context)`	Invoked at the start of execution.
`String`	`getDependentVariable()`	Get the field name of the dependent variable.
`String[]`	`getIndependentVariables()`	Get the field names of the independent variables.
`RecordPort`	`getInput()`	Get the input port of this operator.
`PMMLPort`	`getOutput()`	Get the output port of this operator.
`Map<String,String>`	`getReferenceValues()`	Get the reference values for the independent categorical variables as they were set using the corresponding setter method.
`Double`	`getSingularityThreshold()`	Get singularityThreshold value
`void`	`setDependentVariable(String dependentVariable)`	Set the field name of the dependent variable.
`void`	`setIndependentVariables(String... independentVariables)`	Set the field names of the independent variables.
`void`	`setReferenceValues(Map<String,String> referenceValues)`	Set reference values for the independent categorical variables.
`void`	`setSingularityThreshold(Double singularityThresholdValue)`	Set singularityThreshold value against which a matrix is considered singular or non singular.

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - MAX_DOMAIN_SIZE
```
protected static final int MAX_DOMAIN_SIZE
```
    See Also:
    
    Constant Field Values
  - MIN_DOMAIN_SIZE
```
protected static final int MIN_DOMAIN_SIZE
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - LinearRegressionLearner
```
public LinearRegressionLearner()
```
    Default constructor. Use setDependentVariable(String) and setIndependentVariables(String...) to set the dependent and independent variables.
  - LinearRegressionLearner
```
public LinearRegressionLearner(String dependentVariable,
                               String... independentVariables)
```
    Constructor specifying the dependent variable and independent variables.
    
    Parameters:
    
    dependentVariable - name of the dependent variable field
    
    independentVariables - names of the independent variable fields
- Method Detail
  - getDependentVariable
```
public String getDependentVariable()
```
    Get the field name of the dependent variable.
    
    Returns:
    
    dependent variable field name
  - setDependentVariable
```
public void setDependentVariable(String dependentVariable)
```
    Set the field name of the dependent variable.
    
    Parameters:
    
    dependentVariable - dependent variable field name
  - getIndependentVariables
```
public String[] getIndependentVariables()
```
    Get the field names of the independent variables.
    
    Returns:
    
    independent variable field names
  - setIndependentVariables
```
public void setIndependentVariables(String... independentVariables)
```
    Set the field names of the independent variables.
    
    Parameters:
    
    independentVariables - independent variable field names
  - setReferenceValues
```
public void setReferenceValues(Map<String,String> referenceValues)
```
    Set reference values for the independent categorical variables. If no reference value is provided for a certain variable, one randomly chosen value from its domain will be picked as reference.
    
    Parameters:
    
    referenceValues - mapping from independent categorical variable names to their reference values
  - getReferenceValues
```
public Map<String,String> getReferenceValues()
```
    Get the reference values for the independent categorical variables as they were set using the corresponding setter method.
    
    Returns:
    
    mapping from independent categorical variable names to their reference values
  - setSingularityThreshold
```
public void setSingularityThreshold(Double singularityThresholdValue)
```
    Set singularityThreshold value against which a matrix is considered singular or non singular.
    
    Parameters:
    
    singularityThresholdValue - Default bound to determine effective singularity in LU decomposition
  - getSingularityThreshold
```
public Double getSingularityThreshold()
```
    Get singularityThreshold value
    
    Parameters:
    
    singularityThreshold -
  - getInput
```
public RecordPort getInput()
```
    Get the input port of this operator.
    
    Returns:
    
    input port
  - getOutput
```
public PMMLPort getOutput()
```
    Get the output port of this operator. The port provides the linear regression PMML model generated for the input data.
    
    Returns:
    
    output PMML port
  - computeMetadata
```
protected void computeMetadata(IterativeMetadataContext context)
```
    Description copied from class: IterativeOperator
    Implementations must adhere to the following contracts
    General
    Regardless of input ports/output port types, all implementations must do the following:
    
    Validation. Validation of configuration should always be performed first.
    
    Declare operator parallelizability. Implementations must declare by calling IterativeMetadataContext.parallelize(ParallelismStrategy).
    Declare output port parallelizablility. Implementations must declare by calling IterativeMetadataContext.setOutputParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean)
    
    Declare input port parallelizablility. Implementations must declare by calling IterativeMetadataContext.setIterationParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean).
    
    NOTE: There is a convenience method for performing steps 2-4 for the case where all record ports are parallelizable and where we are to determine parallelism based on source:
    
    MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords
    
    Input record ports
    Implementations with input record ports must declare the following:
    
    Required data ordering:
    Implementations that have data ordering requirements must declare them by calling RecordPort#setRequiredDataOrdering, otherwise iteration will proceed on an input dataset whose order is undefined.
    Required data distribution (only applies to parallelizable input ports):
    Implementations that have data distribution requirements must declare them by calling RecordPort#setRequiredDataDistribution, otherwise iteration will proceed on an input dataset whose distribution is the unspecified partial distribution.
    Note that if the upstream operator's output distribution/ordering is compatible with those required, we avoid a re-sort/re-distribution which is generally a very large savings from a performance standpoint.
    Output record ports (static metadata)
    Implementations with output record ports must declare the following:
    
    Type: Implementations must declare their output type by calling RecordPort#setType.
    
    Implementations with output record ports may declare the following:
    
    Output data ordering: Implementations that can make guarantees as to their output ordering may do so by calling RecordPort#setOutputDataOrdering
    
    Output data distribution (only applies to parallelizable output ports): Implementations that can make guarantees as to their output distribution may do so by calling RecordPort#setOutputDataDistribution
    
    Note that both of these properties are optional; if unspecified, performance may suffer since the framework may unnecessarily re-sort/re-distributed the data.
    Input model ports
    In general, iterative operators will tend not to have model input ports, but if so, there is nothing special to declare for input model ports. Models are implicitly duplicated to all partitions when going from non-parallel to parallel ports.
    Output model ports (static metadata)
    SimpleModelPort's have no associated metadata and therefore there is never any output metadata to declare. PMMLPort's, on the other hand, do have associated metadata. For all PMMLPorts, implementations must declare the following:
    
    pmmlModelSpec: Implementations must declare the PMML model spec by calling PMMLPort.setPMMLModelSpec.
    
    Output ports with dynamic metadata
    If an output port has dynamic metadata, implementations can declare by calling IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean). In the case that metadata is dynamic, calls to RecordPort#setType, RecordPort#setOutputDataOrdering, etc are not allowed and thus the sections above entitled "Output record ports (static metadata)" and "Output model ports (static metadata)" must be skipped. Note that, if possible, dynamic metadata should be avoided (see IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean)).
    Specified by:
    
    computeMetadata in class IterativeOperator
    
    Parameters:
    
    context - the context
  - createIterator
```
protected CompositionIterator createIterator(MetadataContext context)
```
    Description copied from class: IterativeOperator
    
    Invoked at the start of execution. The iterator is expected to return a handle that is then used for execution.
    
    Specified by:
    
    createIterator in class IterativeOperator
    
    Parameters:
    
    context - a context in which the iterative operator can find input port metadata, etc. this information was available in the previous call to IterativeOperator.computeMetadata(IterativeMetadataContext), but is available here as well so that the iterative operator need not cache any metadata in its instance variables.
    
    Returns:
    
    a handle that is used for iteration

Modifier and Type	Field	Description
`protected static int`	`MAX_DOMAIN_SIZE`
`protected static int`	`MIN_DOMAIN_SIZE`

Class LinearRegressionLearner

Field Summary

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Field Detail

MAX_DOMAIN_SIZE

MIN_DOMAIN_SIZE

Constructor Detail

LinearRegressionLearner

LinearRegressionLearner

Method Detail

getDependentVariable

setDependentVariable

getIndependentVariables

setIndependentVariables

setReferenceValues

getReferenceValues

setSingularityThreshold

getSingularityThreshold

getInput

getOutput

computeMetadata

General

Input record ports

Output record ports (static metadata)

Input model ports

Output model ports (static metadata)

Output ports with dynamic metadata

createIterator