- java.lang.Object
-
- com.pervasive.datarush.ports.record.DataDistribution
-
- com.pervasive.datarush.ports.record.PartialDataDistribution
-
- com.pervasive.datarush.ports.record.PartialStaticDataDistribution
-
- com.pervasive.datarush.ports.record.KeyDrivenDataDistribution
-
public final class KeyDrivenDataDistribution extends PartialStaticDataDistribution
DataDistribution based on a set of keys from the input data. A KeyDrivenDataDistribution selects a set of keys, applying the specified base partitioning function to that set of keys. The function must be afunction of input
. This guarantees that data with rows with equivalent keys are located in the same partition.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description AliasSet[]
getAliases()
Returns the fields that are referenced by this distribution.PartitioningFunction
getBasePartitionFunction()
Returns the base partitioning function.String[]
getKeys()
Returns the keys by which we are partitioned.protected PartitioningFunction
getPartitioningFunction()
Subclasses must override this method to provide the partitioning function to be usedstatic KeyDrivenDataDistribution
hashed(String... keys)
Returns a distribution that distributes based on a hashcode of the named key fields.static KeyDrivenDataDistribution
hashed(List<String> keys)
Returns a distribution that distributes based on a hashcode of the named key fields.boolean
isGroupedBy(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.boolean
isGroupingCompatible(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.boolean
isLocal()
static KeyDrivenDataDistribution
keyed(PartitioningFunction basePartitionFunction, String... keys)
Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistribution
keyed(PartitioningFunction basePartitionFunction, List<String> keys)
Returns a distribution that distributes based on the given function of the named key fields.static KeyDrivenDataDistribution
localKeyed(PartitioningFunction basePartitionFunction, List<String> keys)
For advanced use only; this redistributes per-JVM but not across the cluster.static KeyDrivenDataDistribution
ranges(List<RecordToken> boundaries)
Returns a distribution that distributes based on specified range boundariesPartialDataDistribution
remap(FieldRemapping mapping)
Applies the given field remapping to this distribution, changing names as required.protected boolean
requiresRepartitionFrom(PartialDataDistribution source)
Subclasses must override this method to declare whether a repartition is required given the source distribution.protected boolean
supportsLocalRepartition()
Subclasses may override to indicate that they support a local repartitionString
toString()
-
Methods inherited from class com.pervasive.datarush.ports.record.PartialDataDistribution
getGatherScheme
-
-
-
-
Method Detail
-
hashed
public static KeyDrivenDataDistribution hashed(String... keys)
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys
- the names of the key fields- Returns:
- a hash distribution
-
hashed
public static KeyDrivenDataDistribution hashed(List<String> keys)
Returns a distribution that distributes based on a hashcode of the named key fields.- Parameters:
keys
- the names of the key fields- Returns:
- a hash distribution
-
ranges
public static KeyDrivenDataDistribution ranges(List<RecordToken> boundaries)
Returns a distribution that distributes based on specified range boundaries- Parameters:
boundaries
- the range boundaries- Returns:
- a range distribution
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, String... keys)
Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction
- the partition functionkeys
- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException
- if the provided function is not afunction of input
-
keyed
public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
Returns a distribution that distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction
- the partition functionkeys
- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException
- if the provided function is not afunction of input
-
localKeyed
public static KeyDrivenDataDistribution localKeyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
For advanced use only; this redistributes per-JVM but not across the cluster. This is not a general-purpose partitioning strategy but can be used in certain cases where there is a performance benefit to locally repartitioning data. Returns a distribution that locally distributes based on the given function of the named key fields.- Parameters:
basePartitionFunction
- the partition functionkeys
- the names of the key fields- Returns:
- a key driven distribution
- Throws:
IllegalArgumentException
- if the provided function is not afunction of input
-
getBasePartitionFunction
public PartitioningFunction getBasePartitionFunction()
Returns the base partitioning function. This function is applied to a record consisting of the selected key fields.- Returns:
- the base partitioning function
-
isLocal
public boolean isLocal()
-
isGroupedBy
public boolean isGroupedBy(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys
- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
isGroupingCompatible
public boolean isGroupingCompatible(String[] keys)
Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.- Parameters:
keys
- the group keys- Returns:
- whether this distribution preserves groupings across partitions
-
requiresRepartitionFrom
protected boolean requiresRepartitionFrom(PartialDataDistribution source)
Description copied from class:PartialStaticDataDistribution
Subclasses must override this method to declare whether a repartition is required given the source distribution. Implementations may err on the side of caution by always returning true but this may have an impact on performance.- Specified by:
requiresRepartitionFrom
in classPartialStaticDataDistribution
- Parameters:
source
- the source distribution- Returns:
- true if a repartition is required, false if this data distribution matches what was specified.
-
toString
public String toString()
- Specified by:
toString
in classDataDistribution
-
getKeys
public String[] getKeys()
Returns the keys by which we are partitioned.- Returns:
- the keys by which we are partitioned.
-
remap
public PartialDataDistribution remap(FieldRemapping mapping)
Applies the given field remapping to this distribution, changing names as required. If any hash keys refer to columns that are dropped as part of the rename, the result is anUnspecifiedPartialDistribution
.- Specified by:
remap
in classDataDistribution
- Parameters:
mapping
- the field remapping.- Returns:
- this distribution, remapped to the new names.
-
getAliases
public AliasSet[] getAliases()
Description copied from class:DataDistribution
Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.- Specified by:
getAliases
in classDataDistribution
- Returns:
- the fields that are referenced by this distribution
-
getPartitioningFunction
protected PartitioningFunction getPartitioningFunction()
Description copied from class:PartialStaticDataDistribution
Subclasses must override this method to provide the partitioning function to be used- Specified by:
getPartitioningFunction
in classPartialStaticDataDistribution
- Returns:
- a partition function
-
supportsLocalRepartition
protected boolean supportsLocalRepartition()
Description copied from class:PartialStaticDataDistribution
Subclasses may override to indicate that they support a local repartition- Overrides:
supportsLocalRepartition
in classPartialStaticDataDistribution
- Returns:
-
-