public final class KNNClassifier extends CompositeOperator
The field containing the classification value (also referred to as the target feature) must be specified. It is not necessary to specify the fields used to calculate nearness (also referred to as the selected features). If omitted, they will be derived from the example and query schema, using all eligible fields. The example and query records need not have the same schema. All that is required is that:
TokenTypeConstant.DOUBLE
.The implementation is designed to minimize memory usage. It is possible to specify an approximate limit on the amount of memory used by the operator; it is not necessary to have sufficient memory to hold both the example and query data in memory, although performance is best in this case.
Modifier and Type | Field and Description |
---|---|
static long |
TRAINING_BUFFER_SIZE_MAX
The largest allowable training buffer, in bytes, 16G.
|
Constructor and Description |
---|
KNNClassifier()
Defines a classifier initially configured with default settings:
A neighborhood set size of 1
The target feature is in the field "class"
Selected features are derived from the fields in common
between the query and training data
Nearness is determined using Euclidean distance
Record classification is by voting
A training buffer of 128M is used
|
KNNClassifier(int k,
String targetFeature)
Defines a classifier initially configured with the specified
neighborhood set size and target feature field.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
ClassificationScheme |
getClassificationScheme()
Gets how the classification of a record in the query data
is determined from the classifications of its nearest neighbors
in the example data.
|
int |
getK()
Gets the size of the nearest neighbor set.
|
NearnessMeasure |
getNearnessMeasure()
Gets how the nearest neighbors of a record in the query data
are determined.
|
RecordPort |
getOutput()
Gets the record port providing the output from the operation.
|
RecordPort |
getQuery()
Gets the record port providing the query data to the operations.
|
List<String> |
getSelectedFeatures()
Gets the fields which will be used when determining the nearest
neighbors.
|
String |
getTargetFeature()
Gets the field in the example data which is used to
provide classification data.
|
RecordPort |
getTraining()
Gets the record port providing the training data to the operations.
|
long |
getTrainingBuffer()
Gets the size of the memory buffer used to hold
the example data.
|
void |
setClassificationScheme(ClassificationScheme scheme)
Specifies how to determine the classification of a record
in the query data from the classifications of its nearest neighbors
in the example data.
|
void |
setK(int k)
Sets the size of the nearest neighbor set.
|
void |
setNearnessMeasure(NearnessMeasure measure)
Specifies how to determine the nearest neighbors of a record in
the query data.
|
void |
setSelectedFeatures(List<String> features)
Specifies the fields to use when determining the nearest
neighbors.
|
void |
setSelectedFeatures(String... features)
Specifies the fields to use when determining the nearest
neighbors.
|
void |
setTargetFeature(String feature)
Specifies the field in the example data which contains
classification data.
|
void |
setTrainingBuffer(long size)
Specifies the amount of memory, in bytes, to use for
buffering the example data.
|
void |
setTrainingBuffer(String sizeSpecifier)
Specifies the amount of memory to use for
buffering the example data.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public static final long TRAINING_BUFFER_SIZE_MAX
public KNNClassifier()
public KNNClassifier(int k, String targetFeature)
KNNClassifier()
.k
- the size of the nearest neighbor settargetFeature
- the field in the example data which contains
classification datapublic RecordPort getTraining()
public RecordPort getQuery()
public RecordPort getOutput()
public void setK(int k)
k
- the size of the nearest neighbor set.com.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the size is not
positive.public int getK()
public void setSelectedFeatures(List<String> features)
These fields must be present in both the example and query
records. They must also be numeric typed; in this context,
any type which can be widened to a TokenTypeConstant.DOUBLE
.
features
- the names of fields to use in computing
nearnesspublic void setSelectedFeatures(String... features)
These fields must be present in both the example and query
records. They must also be numeric typed; in this context,
any type which can be widened to a TokenTypeConstant.DOUBLE
.
features
- the names of fields to use in computing
nearnesspublic List<String> getSelectedFeatures()
public void setTargetFeature(String feature)
feature
- the name of the field to use as an
example record's classpublic String getTargetFeature()
public void setNearnessMeasure(NearnessMeasure measure)
measure
- the measure used to determine "nearness"public NearnessMeasure getNearnessMeasure()
public void setClassificationScheme(ClassificationScheme scheme)
scheme
- the scheme used to classify a recordpublic ClassificationScheme getClassificationScheme()
public void setTrainingBuffer(long size)
If this buffer is too small, temporary files will be used to store intermediate neighborhood data.
size
- the size of the buffer to use, in bytespublic void setTrainingBuffer(String sizeSpecifier)
sizeSpecifier
- the size of the buffer to usepublic long getTrainingBuffer()
protected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2020 Actian Corporation. All rights reserved.