public class MetadataUtil extends Object
DataOrdering
and
DataDistribution
alone. More specifically, it is
always possible to express required metadata for these situations,
but this may cause sorting or redistribution of data which is
not strictly necessary. The utility methods of this encapsulate
the logic of applying these "loose" requirements, permitting
operators to be more flexible with respect to the incoming data.
Data grouping is a data distribution requirement that all records with a specific set of values [k1, k2, ..., kn] for group key fields must be on the same data partition. Typically it also convenient for all records to be adjacent in the dataflow, though this is not required in all cases. The difference between this and data distribution is that in grouping, how the data is spread across the partitions is unimportant. It can be hash partitioned or range partitioned as long as groups are never split across two or more partitions.
Constructor and Description |
---|
MetadataUtil() |
Modifier and Type | Method and Description |
---|---|
static KeyDrivenDataDistribution |
negotiate(DataDistribution distribution,
String... keys)
Negotiates the required data distribution with the specified distribution
using the given grouping fields.
|
static DataOrdering |
negotiate(DataOrdering ordering,
String... keys)
Negotiates the required sort ordering with the specified ordering using
the given key fields.
|
static KeyDrivenDataDistribution |
negotiateGrouping(MetadataCalculationContext ctx,
RecordPort port,
SortKey... keys)
Negotiates the required data distribution on the specified port
using the given grouping fields with sort ordering.
|
static KeyDrivenDataDistribution |
negotiateGrouping(MetadataCalculationContext ctx,
RecordPort port,
String... keys)
Negotiates the required data ordering and distribution on the specified port
using the given grouping fields.
|
static DataOrdering |
negotiateOrdering(MetadataCalculationContext ctx,
RecordPort port,
String... keys)
Negotiates the required data ordering on the specified port using
the given key fields.
|
static void |
negotiateParallelismBasedOnSourceAssumingParallelizableRecords(IterativeMetadataContext ctx)
Uses the maximum of the input ports' parallelism as the max parallelism of the operator
If there are no parallel inputs, this will use
Integer.MAX_VALUE as its parallelism, forcing
a "scatter" from the inputs. |
public static DataOrdering negotiateOrdering(MetadataCalculationContext ctx, RecordPort port, String... keys)
The negotiated ordering will be set as the required ordering for the port as a result of this method.
ctx
- the context in which port metadata is resolvedport
- the port on which to negotiate data orderingkeys
- the key fields on which data must be sortedpublic static DataOrdering negotiate(DataOrdering ordering, String... keys)
ordering
- the source ordering with which to negotiatekeys
- the key fields on which data must be sortedpublic static KeyDrivenDataDistribution negotiateGrouping(MetadataCalculationContext ctx, RecordPort port, String... keys)
The negotiated ordering and distribution will be set as the required
ordering and distribution for the port as a result of this method.
Query the port for its required ordering using
RecordPort.getRequiredDataOrdering(MetadataContext)
to obtain the negotiated data ordering.
ctx
- the context in which port metadata is resolvedport
- the port on which to negotiate data orderingkeys
- the key fields on which data must be groupedpublic static KeyDrivenDataDistribution negotiateGrouping(MetadataCalculationContext ctx, RecordPort port, SortKey... keys)
The negotiated ordering and distribution will be set as the required ordering and distribution for the port as a result of this method. The required ordering will be as requested by the provided keys.
ctx
- the context in which port metadata is resolvedport
- the port on which to negotiate data orderingkeys
- the key fields on which data must be grouped, along with
the required orderingpublic static KeyDrivenDataDistribution negotiate(DataDistribution distribution, String... keys)
distribution
- the source distribution with which to negotiate.keys
- the key fields on which data must be grouppublic static void negotiateParallelismBasedOnSourceAssumingParallelizableRecords(IterativeMetadataContext ctx)
Integer.MAX_VALUE
as its parallelism, forcing
a "scatter" from the inputs. If this operator has an
explicit OperatorSettings#maxParallelism(int)
specified on it or inherited from its parent, this will favor
the explicit setting.
In addition, record input ports will have their iterationParallelizable
flags set to true.
Finally, record output ports will have their outputParallelizable
flags set to true.
ctx
- the metadata contextCopyright © 2021 Actian Corporation. All rights reserved.