- java.lang.Object
-
- com.pervasive.datarush.ports.record.DataDistribution
-
- com.pervasive.datarush.ports.record.PartialDataDistribution
-
- com.pervasive.datarush.ports.record.PartialStaticDataDistribution
-
- com.pervasive.datarush.ports.record.KeyDrivenDataDistribution
-
public final class KeyDrivenDataDistribution extends PartialStaticDataDistribution
DataDistribution based on a set of keys from the input data. A KeyDrivenDataDistribution selects a set of keys, applying the specified base partitioning function to that set of keys. The function must be afunction of input. This guarantees that data with rows with equivalent keys are located in the same partition.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description AliasSet[]getAliases()Returns the fields that are referenced by this distribution.PartitioningFunctiongetBasePartitionFunction()Returns the base partitioning function.String[]getKeys()Returns the keys by which we are partitioned.protected PartitioningFunctiongetPartitioningFunction()Subclasses must override this method to provide the partitioning function to be usedstatic KeyDrivenDataDistributionhashed(String... keys)Returns a distribution that distributes based on a hashcode of the named key fields.static KeyDrivenDataDistributionhashed(List<String> keys)Returns a distribution that distributes based on a hashcode of the named key fields.booleanisGroupedBy(String[] keys)Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.booleanisGroupingCompatible(String[] keys)Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.booleanisLocal()static KeyDrivenDataDistributionkeyed(PartitioningFunction basePartitionFunction, String... keys)Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistributionkeyed(PartitioningFunction basePartitionFunction, List<String> keys)Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistributionlocalKeyed(PartitioningFunction basePartitionFunction, List<String> keys)For advanced use only; this redistributes per-JVM but not across the cluster.static KeyDrivenDataDistributionranges(List<RecordToken> boundaries)Returns a distribution that distributes based on specified range boundariesPartialDataDistributionremap(FieldRemapping mapping)Applies the given field remapping to this distribution, changing names as required.protected booleanrequiresRepartitionFrom(PartialDataDistribution source)Subclasses must override this method to declare whether a repartition is required given the source distribution.protected booleansupportsLocalRepartition()Subclasses may override to indicate that they support a local repartitionStringtoString()-
Methods inherited from class com.pervasive.datarush.ports.record.PartialDataDistribution
getGatherScheme
-
-
-
-
Method Detail
-
hashed
public static KeyDrivenDataDistribution hashed(String... keys)
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys- the names of the key fields- Returns:
- a hash distribution
-
hashed
public static KeyDrivenDataDistribution hashed(List<String> keys)
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys- the names of the key fields- Returns:
- a hash distribution
-
ranges
public static KeyDrivenDataDistribution ranges(List<RecordToken> boundaries)
Returns a distribution that distributes based on specified range boundaries- Parameters:
boundaries- the range boundaries- Returns:
- a range distribution
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, String... keys)
Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
localKeyed
public static KeyDrivenDataDistribution localKeyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
For advanced use only; this redistributes per-JVM but not across the cluster. This is not a general-purpose partitioning strategy but can be used in certain cases where there is a performance benefit to locally repartitioning data. Returns a distribution that locally distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction- the partition functionkeys- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException- if the provided function is not afunction of input
-
getBasePartitionFunction
public PartitioningFunction getBasePartitionFunction()
Returns the base partitioning function. This function is applied to a record consisting of the selected key fields.- Returns:
- the base partitioning function
-
isLocal
public boolean isLocal()
-
isGroupedBy
public boolean isGroupedBy(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
isGroupingCompatible
public boolean isGroupingCompatible(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
requiresRepartitionFrom
protected boolean requiresRepartitionFrom(PartialDataDistribution source)
Description copied from class:PartialStaticDataDistributionSubclasses must override this method to declare whether a repartition is required given the source distribution. Implementations may err on the side of caution by always returning true but this may have an impact on performance.- Specified by:
requiresRepartitionFromin classPartialStaticDataDistribution- Parameters:
source- the source distribution- Returns:
- true if a repartition is required, false if this data distribution matches what was specified.
-
toString
public String toString()
- Specified by:
toStringin classDataDistribution
-
getKeys
public String[] getKeys()
Returns the keys by which we are partitioned.- Returns:
- the keys by which we are partitioned.
-
remap
public PartialDataDistribution remap(FieldRemapping mapping)
Applies the given field remapping to this distribution, changing names as required. If any hash keys refer to columns that are dropped as part of the rename, the result is anUnspecifiedPartialDistribution.- Specified by:
remapin classDataDistribution- Parameters:
mapping- the field remapping.- Returns:
- this distribution, remapped to the new names.
-
getAliases
public AliasSet[] getAliases()
Description copied from class:DataDistributionReturns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.- Specified by:
getAliasesin classDataDistribution- Returns:
- the fields that are referenced by this distribution
-
getPartitioningFunction
protected PartitioningFunction getPartitioningFunction()
Description copied from class:PartialStaticDataDistributionSubclasses must override this method to provide the partitioning function to be used- Specified by:
getPartitioningFunctionin classPartialStaticDataDistribution- Returns:
- a partition function
-
supportsLocalRepartition
protected boolean supportsLocalRepartition()
Description copied from class:PartialStaticDataDistributionSubclasses may override to indicate that they support a local repartition- Overrides:
supportsLocalRepartitionin classPartialStaticDataDistribution- Returns:
-
-