Interface IterativeMetadataContext

    • Method Detail

      • parallelize

        void parallelize​(ParallelismStrategy strategy)
        Controls the parallelism of the ports of this operator. Is ParallelismStrategy.NON_PARALLELIZABLE by default. If non-parallelizable, all data will be brought to a single processor on the machine/cluster. In general, most operations must be parallelizable for them to be scalable. Some operations (composites) may consist of parallelizable operations that operator on large pieces of data, followed by non-parallelizable operations that operate on smaller pieces.
        Parameters:
        strategy - the strategy to use to determine parallelization.
        See Also:
        ParallelismStrategy
      • setOutputParallelizable

        void setOutputParallelizable​(LogicalPort port,
                                     boolean parallel)
        Sets whether the given output port is parallel. Because the body of an iterative operator is dynamic, iterative operators must call this method to declare to the framework whether the output dataset is parallel. Failure to do so will result in poor performance. Note that if you are using the method MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords, you need not make this declaration.
        Parameters:
        port - the output port
        parallel - whether the output is parallel
      • setIterationParallelizable

        void setIterationParallelizable​(LogicalPort port,
                                        boolean parallel)
        Sets whether the operator will iterate on the given input data in parallel. Iterative operators must call this method to declare to the framework how they intend to process the data of a given port. If this is set to true, data will be staged in a distributed fashion (either throughout the cluster or in multiple local files if running in pseudo-distributed). If set to false, data will be staged on a single file on a single machine. Note that if you are using the method MetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords, you need not make this declaration.
        Parameters:
        port - the port providing a dataset to process
        parallel - whether the data is to be processed in parallel
      • setOutputMetadataDynamic

        void setOutputMetadataDynamic​(LogicalPort port,
                                      boolean dynamic)
        Indicates that the metadata for the given output port is dynamic. If metadata is dynamic, outputMetadata must not be specified. The graph is compiled and executed up until this operator. Following execution of this operator, downstream operators can then be compiled. Note that dynamic metadata should be used sparingly as it causes graph compilation errors to be postponed until after a (potentially) long operation.
        Parameters:
        port - the output port
        dynamic - a value of true means metadata is dynamic.