public final class KMeans extends IterativeOperator implements Serializable
k
random data points as the
initial cluster centers. Within each iteration, it assigns input data points
to current clusters based on the distance to the cluster centers and recomputes
the centers based on the new assignments. Computation stops when one of the
following conditions is true:
maxIterations
is exceeded#isSignificantlyDifferent()
)Constructor and Description |
---|
KMeans()
Create a new KMeans operator, initializing settings to their default values.
|
Modifier and Type | Method and Description |
---|---|
protected void |
computeMetadata(IterativeMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected CompositionIterator |
createIterator(MetadataContext ctx)
Invoked at the start of execution.
|
DistanceMeasure |
getDistanceMeasure()
Returns the distance measure used to measure the distance between two points
when building the model.
|
List<String> |
getIncludedColumns()
Returns the list of columns to include for k-Means.
|
RecordPort |
getInput()
Returns the input port.
|
int |
getK()
Returns the "k" value, where k is the number of centroids to compute.
|
int |
getMaxIterations()
Returns the maximum number of iterations.
|
PMMLPort |
getModel()
Returns the model port.
|
static boolean |
isSignificantlyDifferent(Cluster first,
Cluster second)
Returns true if the centers of both clusters are significantly different
compared to each other.
|
void |
setDistanceMeasure(DistanceMeasure distanceMeasure)
Sets the distance measure used to measure the distance between two points
when building the model.
|
void |
setIncludedColumns(List<String> includedColumns)
Sets the list of columns to include for k-Means.
|
void |
setK(int k)
Sets the "k" value, where k is the number of centroids to compute.
|
void |
setMaxIterations(int max)
Sets the maximum number of iterations.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public KMeans()
public RecordPort getInput()
public PMMLPort getModel()
public int getMaxIterations()
public void setMaxIterations(int max)
max
- The maximum number of iterations.public int getK()
public void setK(int k)
k
- the "k" value.public List<String> getIncludedColumns()
public void setIncludedColumns(List<String> includedColumns)
includedColumns
- The list of columns to includepublic DistanceMeasure getDistanceMeasure()
public void setDistanceMeasure(DistanceMeasure distanceMeasure)
distanceMeasure
- the distance measureprotected void computeMetadata(IterativeMetadataContext ctx)
IterativeOperator
IterativeMetadataContext.parallelize(ParallelismStrategy)
.
IterativeMetadataContext.setOutputParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean)
IterativeMetadataContext.setIterationParallelizable(com.pervasive.datarush.ports.LogicalPort, boolean)
.MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords
RecordPort#setRequiredDataOrdering
, otherwise iteration will proceed on an input dataset whose order is undefined.
RecordPort#setRequiredDataDistribution
, otherwise iteration will proceed on an input dataset whose distribution is the unspecified partial distribution
.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean)
. In the case that metadata
is dynamic, calls to RecordPort#setType
, RecordPort#setOutputDataOrdering
,
etc are not allowed and thus the sections above entitled "Output record ports (static metadata)"
and "Output model ports (static metadata)" must be skipped. Note that, if possible,
dynamic metadata should be avoided (see IterativeMetadataContext.setOutputMetadataDynamic(com.pervasive.datarush.ports.LogicalPort, boolean)
).
computeMetadata
in class IterativeOperator
ctx
- the contextprotected CompositionIterator createIterator(MetadataContext ctx)
IterativeOperator
createIterator
in class IterativeOperator
ctx
- a context in which the iterative operator can find input port metadata, etc.
this information was available in the previous call to IterativeOperator.computeMetadata(IterativeMetadataContext)
,
but is available here as well so that the iterative operator need not cache any
metadata in its instance variables.public static boolean isSignificantlyDifferent(Cluster first, Cluster second)
i
, the following is true:
Math.abs(firts.getNumArray()[i] - second.getNumArray()[i]) >
1e-10*(first.getNumArray()[i] + other.getNumArray()[i])
first
- the reference clustersecond
- the cluster to be compared to the referenceCopyright © 2019 Actian Corporation. All rights reserved.