java.lang.Object
com.pervasive.datarush.ports.record.DataDistribution
com.pervasive.datarush.ports.record.PartialDataDistribution
com.pervasive.datarush.ports.record.PartialStaticDataDistribution
com.pervasive.datarush.ports.record.KeyDrivenDataDistribution
DataDistribution based on a set of keys from the input data. A
KeyDrivenDataDistribution selects a set of keys, applying the specified
base partitioning function to that set of keys. The function
must be a
function of input. This
guarantees that data with rows with equivalent keys are located in the same partition.-
Method Summary
Modifier and TypeMethodDescriptionAliasSet[]Returns the fields that are referenced by this distribution.Returns the base partitioning function.String[]getKeys()Returns the keys by which we are partitioned.protected PartitioningFunctionSubclasses must override this method to provide the partitioning function to be usedstatic KeyDrivenDataDistributionReturns a distribution that distributes based on a hashcode of the named key fields.static KeyDrivenDataDistributionReturns a distribution that distributes based on a hashcode of the named key fields.booleanisGroupedBy(String[] keys) Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.booleanisGroupingCompatible(String[] keys) Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.booleanisLocal()static KeyDrivenDataDistributionkeyed(PartitioningFunction basePartitionFunction, String... keys) Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistributionkeyed(PartitioningFunction basePartitionFunction, List<String> keys) Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistributionlocalKeyed(PartitioningFunction basePartitionFunction, List<String> keys) For advanced use only; this redistributes per-JVM but not across the cluster.static KeyDrivenDataDistributionranges(List<RecordToken> boundaries) Returns a distribution that distributes based on specified range boundariesremap(FieldRemapping mapping) Applies the given field remapping to this distribution, changing names as required.protected booleanSubclasses must override this method to declare whether a repartition is required given the source distribution.protected booleanSubclasses may override to indicate that they support a local repartitiontoString()Methods inherited from class com.pervasive.datarush.ports.record.PartialDataDistribution
getGatherScheme
-
Method Details
-
hashed
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys- the names of the key fields- Returns:
- a hash distribution
-
hashed
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys- the names of the key fields- Returns:
- a hash distribution
-
ranges
Returns a distribution that distributes based on specified range boundaries- Parameters:
boundaries- the range boundaries- Returns:
- a range distribution
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, String... keys) Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
localKeyed
public static KeyDrivenDataDistribution localKeyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException For advanced use only; this redistributes per-JVM but not across the cluster. This is not a general-purpose partitioning strategy but can be used in certain cases where there is a performance benefit to locally repartitioning data. Returns a distribution that locally distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
getBasePartitionFunction
Returns the base partitioning function. This function is applied to a record consisting of the selected key fields.- Returns:
- the base partitioning function
-
isLocal
public boolean isLocal() -
isGroupedBy
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
isGroupingCompatible
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
requiresRepartitionFrom
Description copied from class:PartialStaticDataDistributionSubclasses must override this method to declare whether a repartition is required given the source distribution. Implementations may err on the side of caution by always returning true but this may have an impact on performance.- Specified by:
requiresRepartitionFromin classPartialStaticDataDistribution- Parameters:
source- the source distribution- Returns:
- true if a repartition is required, false if this data distribution matches what was specified.
-
toString
- Specified by:
toStringin classDataDistribution
-
getKeys
Returns the keys by which we are partitioned.- Returns:
- the keys by which we are partitioned.
-
remap
Applies the given field remapping to this distribution, changing names as required. If any hash keys refer to columns that are dropped as part of the rename, the result is anUnspecifiedPartialDistribution.- Specified by:
remapin classDataDistribution- Parameters:
mapping- the field remapping.- Returns:
- this distribution, remapped to the new names.
-
getAliases
Description copied from class:DataDistributionReturns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.- Specified by:
getAliasesin classDataDistribution- Returns:
- the fields that are referenced by this distribution
-
getPartitioningFunction
Description copied from class:PartialStaticDataDistributionSubclasses must override this method to provide the partitioning function to be used- Specified by:
getPartitioningFunctionin classPartialStaticDataDistribution- Returns:
- a partition function
-
supportsLocalRepartition
protected boolean supportsLocalRepartition()Description copied from class:PartialStaticDataDistributionSubclasses may override to indicate that they support a local repartition- Overrides:
supportsLocalRepartitionin classPartialStaticDataDistribution- Returns:
-