Interface IterativeMetadataContext

All Superinterfaces:
MetadataCalculationContext, MetadataContext

public interface IterativeMetadataContext extends MetadataCalculationContext
Context used by IterativeOperator.computeMetadata(IterativeMetadataContext). With iterative operations, parallel vs. non-parallel is determined on a port-by-port basis. By default all ports are assumed to be non-parallel. Operations must declare whether their ports are distributed or local.
  • Method Details

    • parallelize

      void parallelize(ParallelismStrategy strategy)
      Controls the parallelism of the ports of this operator. Is ParallelismStrategy.NON_PARALLELIZABLE by default. If non-parallelizable, all data will be brought to a single processor on the machine/cluster. In general, most operations must be parallelizable for them to be scalable. Some operations (composites) may consist of parallelizable operations that operator on large pieces of data, followed by non-parallelizable operations that operate on smaller pieces.
      Parameters:
      strategy - the strategy to use to determine parallelization.
      See Also:
    • setOutputParallelizable

      void setOutputParallelizable(LogicalPort port, boolean parallel)
      Sets whether the given output port is parallel. Because the body of an iterative operator is dynamic, iterative operators must call this method to declare to the framework whether the output dataset is parallel. Failure to do so will result in poor performance. Note that if you are using the method MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords, you need not make this declaration.
      Parameters:
      port - the output port
      parallel - whether the output is parallel
    • setIterationParallelizable

      void setIterationParallelizable(LogicalPort port, boolean parallel)
      Sets whether the operator will iterate on the given input data in parallel. Iterative operators must call this method to declare to the framework how they intend to process the data of a given port. If this is set to true, data will be staged in a distributed fashion (either throughout the cluster or in multiple local files if running in pseudo-distributed). If set to false, data will be staged on a single file on a single machine. Note that if you are using the method MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords, you need not make this declaration.
      Parameters:
      port - the port providing a dataset to process
      parallel - whether the data is to be processed in parallel
    • setOutputMetadataDynamic

      void setOutputMetadataDynamic(LogicalPort port, boolean dynamic)
      Indicates that the metadata for the given output port is dynamic. If metadata is dynamic, outputMetadata must not be specified. The graph is compiled and executed up until this operator. Following execution of this operator, downstream operators can then be compiled. Note that dynamic metadata should be used sparingly as it causes graph compilation errors to be postponed until after a (potentially) long operation.
      Parameters:
      port - the output port
      dynamic - a value of true means metadata is dynamic.