Class MetadataUtil


  • public class MetadataUtil
    extends Object
    Provides utility methods for working with port metadata. Some common operator requirements such as data ordering or data grouping (explained below) have slightly less strict requirements than is expressible with DataOrdering and DataDistribution alone. More specifically, it is always possible to express required metadata for these situations, but this may cause sorting or redistribution of data which is not strictly necessary. The utility methods of this encapsulate the logic of applying these "loose" requirements, permitting operators to be more flexible with respect to the incoming data.

    Data grouping is a data distribution requirement that all records with a specific set of values [k1, k2, ..., kn] for group key fields must be on the same data partition. Typically it also convenient for all records to be adjacent in the dataflow, though this is not required in all cases. The difference between this and data distribution is that in grouping, how the data is spread across the partitions is unimportant. It can be hash partitioned or range partitioned as long as groups are never split across two or more partitions.

    Since:
    6.1
    • Constructor Detail

      • MetadataUtil

        public MetadataUtil()
    • Method Detail

      • negotiateOrdering

        public static DataOrdering negotiateOrdering​(MetadataCalculationContext ctx,
                                                     RecordPort port,
                                                     String... keys)
        Negotiates the required data ordering on the specified port using the given key fields. No explicit sort ordering is assumed for key fields; any sorting already on the requested keys is acceptable. The returned ordering can be queried to obtain the negotiated sort ordering of key fields.

        The negotiated ordering will be set as the required ordering for the port as a result of this method.

        Parameters:
        ctx - the context in which port metadata is resolved
        port - the port on which to negotiate data ordering
        keys - the key fields on which data must be sorted
        Returns:
        the negotiated data ordering
      • negotiate

        public static DataOrdering negotiate​(DataOrdering ordering,
                                             String... keys)
        Negotiates the required sort ordering with the specified ordering using the given key fields. No explicit sort ordering is assumed for key fields; any sorting already on the requested keys is acceptable. The returned ordering can be queried to obtain the negotiated sort ordering of key fields.
        Parameters:
        ordering - the source ordering with which to negotiate
        keys - the key fields on which data must be sorted
        Returns:
        the negotiated data ordering
      • negotiateGrouping

        public static KeyDrivenDataDistribution negotiateGrouping​(MetadataCalculationContext ctx,
                                                                  RecordPort port,
                                                                  String... keys)
        Negotiates the required data ordering and distribution on the specified port using the given grouping fields. The resulting settings are guaranteed to produce no groups which are split across partitions. Furthermore, all records in the the data flow will be ordered in some way on the group keys, ensuring all records in the group appear sequentially. No explicit sort ordering is assumed for key fields; any sorting already on the requested keys is accepted.

        The negotiated ordering and distribution will be set as the required ordering and distribution for the port as a result of this method. Query the port for its required ordering using RecordPort.getRequiredDataOrdering(MetadataContext) to obtain the negotiated data ordering.

        Parameters:
        ctx - the context in which port metadata is resolved
        port - the port on which to negotiate data ordering
        keys - the key fields on which data must be grouped
        Returns:
        the negotiated data distribution
      • negotiateGrouping

        public static KeyDrivenDataDistribution negotiateGrouping​(MetadataCalculationContext ctx,
                                                                  RecordPort port,
                                                                  SortKey... keys)
        Negotiates the required data distribution on the specified port using the given grouping fields with sort ordering. The resulting settings are guaranteed to produce no groups which are split across partitions. Furthermore, all records in the the data flow will be ordered as specified on the group keys, ensuring all records in the group appear sequentially.

        The negotiated ordering and distribution will be set as the required ordering and distribution for the port as a result of this method. The required ordering will be as requested by the provided keys.

        Parameters:
        ctx - the context in which port metadata is resolved
        port - the port on which to negotiate data ordering
        keys - the key fields on which data must be grouped, along with the required ordering
        Returns:
        the negotiated data distribution
      • negotiate

        public static KeyDrivenDataDistribution negotiate​(DataDistribution distribution,
                                                          String... keys)
        Negotiates the required data distribution with the specified distribution using the given grouping fields. The resulting distribution is guaranteed to produce no groups which are split across partitions. The returned distribution can be queried to obtain the negotiated partition key fields.
        Parameters:
        distribution - the source distribution with which to negotiate.
        keys - the key fields on which data must be group
        Returns:
        the negotiated data distribution
      • negotiateParallelismBasedOnSourceAssumingParallelizableRecords

        public static void negotiateParallelismBasedOnSourceAssumingParallelizableRecords​(IterativeMetadataContext ctx)
        Uses the maximum of the input ports' parallelism as the max parallelism of the operator If there are no parallel inputs, this will use Integer.MAX_VALUE as its parallelism, forcing a "scatter" from the inputs. If this operator has an explicit OperatorSettings#maxParallelism(int) specified on it or inherited from its parent, this will favor the explicit setting.

        In addition, record input ports will have their iterationParallelizable flags set to true.

        Finally, record output ports will have their outputParallelizable flags set to true.

        Parameters:
        ctx - the metadata context