com.pervasive.datarush.ports.record.DataDistribution

Direct Known Subclasses:: FullDataDistribution, PartialDataDistribution

public abstract class DataDistribution extends Object

DataDistribution is the component of RecordMetadata that describes how the data is distributed. Distributions are usually partial, meaning data is partitioned in some way throughout the cluster or among different threads in the case of pseudo-distributed operation. In rare cases a full distribution is required, but that should only be used when data is "small" since it must be replicated throughout all nodes in the cluster.

Operators may declare a required distribution by calling RecordPort.setRequiredDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution). It is the responsibility of the framework to ensure that requirement is met. Operators may also declare their output distribution by calling RecordPort.setOutputDataDistribution(com.pervasive.datarush.operators.MetadataCalculationContext, com.pervasive.datarush.ports.record.DataDistribution). This lets the framework know how data is distributed on the operator's output. If there is an mismatch between required and provided metadata, the framework will automatically redistribute as needed.

The following table lists several combinations of source and target distributions along wih the actions taken by the framework for each combination.

Source Distribution	Required Distribution	Framework Action
`FullDataDistribution`	`FullDataDistribution`	None required
`FullDataDistribution`	`Any Partial`	Error (unsupported)
`Any Partial`	`FullDataDistribution`	Redistribute
`Any Partial`	`UnspecifiedPartialDistribution`	None required
`BalancedDistribution`	`BalancedDistribution`	None required
`Any Partial (other than balanced)`	`BalancedDistribution`	Redistribute evenly
`KeyDrivenDataDistribution hashed on keys [a,b]`	`KeyDrivenDataDistribution hashed on keys [a,b]`	None required
`KeyDrivenDataDistribution hashed on keys [a,b]`	`KeyDrivenDataDistribution hashed on keys [a,b,c]`	Redistribute on keys [a,b,c]

See Also:

StreamingOperator#computeMetadata
IterativeOperator#computeMetadata

Constructor Summary

Constructors

Constructor

Description

DataDistribution()
Method Summary

Modifier and Type

Method

Description

abstract AliasSet[]

getAliases()

Returns the fields that are referenced by this distribution.

abstract DataDistribution

remap(FieldRemapping mapping)

Applies the given field remapping to this mapping, changing names as required.

abstract String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- DataDistribution
  
  public DataDistribution()
Method Details
- toString
  
  public abstract String toString()
  
  Overrides:
  
  toString in class Object
- remap
  
  public abstract DataDistribution remap(FieldRemapping mapping)
  
  Applies the given field remapping to this mapping, changing names as required. Distributions that reference keys must have their key names remapped.
  
  Parameters:
  
  mapping - the field remapping.
  
  Returns:
  
  this distribution, remapped to the new names.
- getAliases
  
  public abstract AliasSet[] getAliases()
  
  Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.
  
  Returns:
  
  the fields that are referenced by this distribution

Class DataDistribution

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

DataDistribution

Method Details

toString

remap

getAliases