java.lang.Object
com.pervasive.datarush.ports.record.DataDistribution
- Direct Known Subclasses:
FullDataDistribution,PartialDataDistribution
DataDistribution is the component of
RecordMetadata that describes how
the data is distributed. Distributions are usually partial,
meaning data is partitioned in some way throughout the cluster or among different
threads in the case of pseudo-distributed operation. In rare cases a full
distribution is required, but that should only be used when data is "small" since it
must be replicated throughout all nodes in the cluster.
Operators may declare a required distribution by calling RecordPort.setRequiredDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution).
It is the responsibility of the framework to ensure that requirement is met.
Operators may also declare their output distribution by calling RecordPort.setOutputDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution).
This lets the framework know how data is distributed on the operator's output. If there is
an mismatch between required and provided metadata, the framework will automatically redistribute
as needed.
| Source Distribution | Required Distribution | Framework Action |
|---|---|---|
FullDataDistribution | FullDataDistribution | None required |
FullDataDistribution | Any Partial | Error (unsupported) |
Any Partial | FullDataDistribution | Redistribute |
Any Partial | UnspecifiedPartialDistribution | None required |
BalancedDistribution | BalancedDistribution | None required |
Any Partial (other than balanced) | BalancedDistribution | Redistribute evenly |
KeyDrivenDataDistribution hashed on keys [a,b] | KeyDrivenDataDistribution hashed on keys [a,b] | None required |
KeyDrivenDataDistribution hashed on keys [a,b] | KeyDrivenDataDistribution hashed on keys [a,b,c] | Redistribute on keys [a,b,c] |
- See Also:
-
StreamingOperator#computeMetadataIterativeOperator#computeMetadata
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract AliasSet[]Returns the fields that are referenced by this distribution.abstract DataDistributionremap(FieldRemapping mapping) Applies the given field remapping to this mapping, changing names as required.abstract StringtoString()
-
Constructor Details
-
DataDistribution
public DataDistribution()
-
-
Method Details
-
toString
-
remap
Applies the given field remapping to this mapping, changing names as required. Distributions that reference keys must have their key names remapped.- Parameters:
mapping- the field remapping.- Returns:
- this distribution, remapped to the new names.
-
getAliases
Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.- Returns:
- the fields that are referenced by this distribution
-