- java.lang.Object
-
- com.pervasive.datarush.ports.record.DataDistribution
-
- Direct Known Subclasses:
FullDataDistribution
,PartialDataDistribution
public abstract class DataDistribution extends Object
DataDistribution is the component ofRecordMetadata
that describes how the data is distributed. Distributions are usuallypartial
, meaning data is partitioned in some way throughout the cluster or among different threads in the case of pseudo-distributed operation. In rare cases afull
distribution is required, but that should only be used when data is "small" since it must be replicated throughout all nodes in the cluster.Operators may declare a required distribution by calling
The following table lists several combinations of source and target distributions along wih the actions taken by the framework for each combination.RecordPort.setRequiredDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution)
. It is the responsibility of the framework to ensure that requirement is met. Operators may also declare their output distribution by callingRecordPort.setOutputDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution)
. This lets the framework know how data is distributed on the operator's output. If there is an mismatch between required and provided metadata, the framework will automatically redistribute as needed.Source Distribution Required Distribution Framework Action FullDataDistribution
FullDataDistribution
None required FullDataDistribution
Any Partial
Error (unsupported) Any Partial
FullDataDistribution
Redistribute Any Partial
UnspecifiedPartialDistribution
None required BalancedDistribution
BalancedDistribution
None required Any Partial (other than balanced)
BalancedDistribution
Redistribute evenly KeyDrivenDataDistribution hashed on keys [a,b]
KeyDrivenDataDistribution hashed on keys [a,b]
None required KeyDrivenDataDistribution hashed on keys [a,b]
KeyDrivenDataDistribution hashed on keys [a,b,c]
Redistribute on keys [a,b,c] - See Also:
StreamingOperator#computeMetadata
,IterativeOperator#computeMetadata
-
-
Constructor Summary
Constructors Constructor Description DataDistribution()
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description abstract AliasSet[]
getAliases()
Returns the fields that are referenced by this distribution.abstract DataDistribution
remap(FieldRemapping mapping)
Applies the given field remapping to this mapping, changing names as required.abstract String
toString()
-
-
-
Method Detail
-
remap
public abstract DataDistribution remap(FieldRemapping mapping)
Applies the given field remapping to this mapping, changing names as required. Distributions that reference keys must have their key names remapped.- Parameters:
mapping
- the field remapping.- Returns:
- this distribution, remapped to the new names.
-
getAliases
public abstract AliasSet[] getAliases()
Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.- Returns:
- the fields that are referenced by this distribution
-
-