Class KeyDrivenDataDistribution


public final class KeyDrivenDataDistribution extends PartialStaticDataDistribution
DataDistribution based on a set of keys from the input data. A KeyDrivenDataDistribution selects a set of keys, applying the specified base partitioning function to that set of keys. The function must be a function of input. This guarantees that data with rows with equivalent keys are located in the same partition.
  • Method Details

    • hashed

      public static KeyDrivenDataDistribution hashed(String... keys)
      Returns a distribution that distributes based on a hashcode of the named key fields.
      Parameters:
      keys - the names of the key fields
      Returns:
      a hash distribution
    • hashed

      public static KeyDrivenDataDistribution hashed(List<String> keys)
      Returns a distribution that distributes based on a hashcode of the named key fields.
      Parameters:
      keys - the names of the key fields
      Returns:
      a hash distribution
    • ranges

      public static KeyDrivenDataDistribution ranges(List<RecordToken> boundaries)
      Returns a distribution that distributes based on specified range boundaries
      Parameters:
      boundaries - the range boundaries
      Returns:
      a range distribution
    • keyed

      public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, String... keys)
      Returns a distribution that distributes based on the given function of the named key fields.
      Parameters:
      basePartitionFunction - the partition function
      keys - the names of the key fields
      Returns:
      a key driven distribution
      Throws:
      IllegalArgumentException - if the provided function is not a function of input
    • keyed

      public static KeyDrivenDataDistribution keyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
      Returns a distribution that distributes based on the given function of the named key fields.
      Parameters:
      basePartitionFunction - the partition function
      keys - the names of the key fields
      Returns:
      a key driven distribution
      Throws:
      IllegalArgumentException - if the provided function is not a function of input
    • localKeyed

      public static KeyDrivenDataDistribution localKeyed(PartitioningFunction basePartitionFunction, List<String> keys) throws IllegalArgumentException
      For advanced use only; this redistributes per-JVM but not across the cluster. This is not a general-purpose partitioning strategy but can be used in certain cases where there is a performance benefit to locally repartitioning data.

      Returns a distribution that locally distributes based on the given function of the named key fields.

      Parameters:
      basePartitionFunction - the partition function
      keys - the names of the key fields
      Returns:
      a key driven distribution
      Throws:
      IllegalArgumentException - if the provided function is not a function of input
    • getBasePartitionFunction

      public PartitioningFunction getBasePartitionFunction()
      Returns the base partitioning function. This function is applied to a record consisting of the selected key fields.
      Returns:
      the base partitioning function
    • isLocal

      public boolean isLocal()
    • isGroupedBy

      public boolean isGroupedBy(String[] keys)
      Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.
      Parameters:
      keys - the group keys
      Returns:
      whether this distribution preserves groupings across partitions
    • isGroupingCompatible

      public boolean isGroupingCompatible(String[] keys)
      Returns true if this distribution guarantees groupings on the specified list of keys will not be split across partitions.
      Parameters:
      keys - the group keys
      Returns:
      whether this distribution preserves groupings across partitions
    • requiresRepartitionFrom

      protected boolean requiresRepartitionFrom(PartialDataDistribution source)
      Description copied from class: PartialStaticDataDistribution
      Subclasses must override this method to declare whether a repartition is required given the source distribution. Implementations may err on the side of caution by always returning true but this may have an impact on performance.
      Specified by:
      requiresRepartitionFrom in class PartialStaticDataDistribution
      Parameters:
      source - the source distribution
      Returns:
      true if a repartition is required, false if this data distribution matches what was specified.
    • toString

      public String toString()
      Specified by:
      toString in class DataDistribution
    • getKeys

      public String[] getKeys()
      Returns the keys by which we are partitioned.
      Returns:
      the keys by which we are partitioned.
    • remap

      public PartialDataDistribution remap(FieldRemapping mapping)
      Applies the given field remapping to this distribution, changing names as required. If any hash keys refer to columns that are dropped as part of the rename, the result is an UnspecifiedPartialDistribution.
      Specified by:
      remap in class DataDistribution
      Parameters:
      mapping - the field remapping.
      Returns:
      this distribution, remapped to the new names.
    • getAliases

      public AliasSet[] getAliases()
      Description copied from class: DataDistribution
      Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.
      Specified by:
      getAliases in class DataDistribution
      Returns:
      the fields that are referenced by this distribution
    • getPartitioningFunction

      protected PartitioningFunction getPartitioningFunction()
      Description copied from class: PartialStaticDataDistribution
      Subclasses must override this method to provide the partitioning function to be used
      Specified by:
      getPartitioningFunction in class PartialStaticDataDistribution
      Returns:
      a partition function
    • supportsLocalRepartition

      protected boolean supportsLocalRepartition()
      Description copied from class: PartialStaticDataDistribution
      Subclasses may override to indicate that they support a local repartition
      Overrides:
      supportsLocalRepartition in class PartialStaticDataDistribution
      Returns: