Class DynamicRangeDataDistribution


public final class DynamicRangeDataDistribution extends PartialDynamicDataDistribution
A distribution where data is range-partitioned by a selected array of keys. Ranges are dynamically computed so as to find split points that guarantee that the data is roughly evenly distributed. Data is sampled and the split points are set to evenly-spaces quantiles within the sample.
See Also:
  • Constructor Details

    • DynamicRangeDataDistribution

      public DynamicRangeDataDistribution(List<String> keys)
      Creates a range distribution for a list of range keys.
      Parameters:
      keys - the range keys
    • DynamicRangeDataDistribution

      public DynamicRangeDataDistribution(String... keys)
      Creates a range distribution for a list of range keys.
      Parameters:
      keys - the range keys
  • Method Details

    • isGroupedBy

      public boolean isGroupedBy(String[] keys)
      Returns true if this range distribution exactly matches the specified list of keys.
      Parameters:
      keys - the range keys
      Returns:
      whether this distribution exactly matches the specified list of keys
    • toString

      public String toString()
      Specified by:
      toString in class DataDistribution
    • getKeys

      public String[] getKeys()
      Returns the keys by which we are range-partitioned.
      Returns:
      the keys by which we are range-partitioned.
    • remap

      public PartialDataDistribution remap(FieldRemapping mapping)
      Applies the given field remapping to this mapping, changing names as required. If any range keys refer to columns that are dropped as part of the rename, the result is an UnspecifiedPartialDistribution.
      Specified by:
      remap in class DataDistribution
      Parameters:
      mapping - the field remapping.
      Returns:
      this distribution, remapped to the new names.
    • getAliases

      public AliasSet[] getAliases()
      Description copied from class: DataDistribution
      Returns the fields that are referenced by this distribution. Note that it is valid for a distribution to reference no fields, in which case it should return an empty array. This method is used by the framework to validate the distribution is consistent with the type of the record.
      Specified by:
      getAliases in class DataDistribution
      Returns:
      the fields that are referenced by this distribution