public abstract class DataDistribution extends Object
RecordMetadata
that describes how
the data is distributed. Distributions are usually partial
,
meaning data is partitioned in some way throughout the cluster or among different
threads in the case of pseudo-distributed operation. In rare cases a full
distribution is required, but that should only be used when data is "small" since it
must be replicated throughout all nodes in the cluster.
Operators may declare a required distribution by calling RecordPort.setRequiredDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution)
.
It is the responsibility of the framework to ensure that requirement is met.
Operators may also declare their output distribution by calling RecordPort.setOutputDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution)
.
This lets the framework know how data is distributed on the operator's output. If there is
an mismatch between required and provided metadata, the framework will automatically redistribute
as needed.
Source Distribution | Required Distribution | Framework Action |
---|---|---|
FullDataDistribution | FullDataDistribution | None required |
FullDataDistribution | Any Partial | Error (unsupported) |
Any Partial | FullDataDistribution | Redistribute |
Any Partial | UnspecifiedPartialDistribution | None required |
BalancedDistribution | BalancedDistribution | None required |
Any Partial (other than balanced) | BalancedDistribution | Redistribute evenly |
KeyDrivenDataDistribution hashed on keys [a,b] | KeyDrivenDataDistribution hashed on keys [a,b] | None required |
KeyDrivenDataDistribution hashed on keys [a,b] | KeyDrivenDataDistribution hashed on keys [a,b,c] | Redistribute on keys [a,b,c] |
StreamingOperator#computeMetadata
,
IterativeOperator#computeMetadata
Constructor and Description |
---|
DataDistribution() |
Modifier and Type | Method and Description |
---|---|
abstract AliasSet[] |
getAliases()
Returns the fields that are referenced by this distribution.
|
abstract DataDistribution |
remap(FieldRemapping mapping)
Applies the given field remapping to this mapping, changing names as required.
|
abstract String |
toString() |
public abstract DataDistribution remap(FieldRemapping mapping)
mapping
- the field remapping.public abstract AliasSet[] getAliases()
Copyright © 2024 Actian Corporation. All rights reserved.