-
- All Superinterfaces:
MetadataCalculationContext
,MetadataContext
public interface IterativeMetadataContext extends MetadataCalculationContext
Context used byIterativeOperator.computeMetadata(IterativeMetadataContext)
. With iterative operations, parallel vs. non-parallel is determined on a port-by-port basis. By default all ports are assumed to be non-parallel. Operations must declare whether their ports are distributed or local.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
parallelize(ParallelismStrategy strategy)
Controls the parallelism of the ports of this operator.void
setIterationParallelizable(LogicalPort port, boolean parallel)
Sets whether the operator will iterate on the given input data in parallel.void
setOutputMetadataDynamic(LogicalPort port, boolean dynamic)
Indicates that the metadata for the given output port is dynamic.void
setOutputParallelizable(LogicalPort port, boolean parallel)
Sets whether the given output port is parallel.-
Methods inherited from interface com.pervasive.datarush.operators.MetadataCalculationContext
getCompilationLevel, getOperator, setOutputMetadata, setOutputMetadataDynamic, setRequiredMetadata, setStagingForced
-
Methods inherited from interface com.pervasive.datarush.operators.MetadataContext
getCombinedMetadata, getEngineConfig, getFileClient, getMaxParallelism, getPath, getRequiredMetadata, getSourceMaxParallelism, getSourceMetadata, isParallel, isSourceConnected, isSourceParallel
-
-
-
-
Method Detail
-
parallelize
void parallelize(ParallelismStrategy strategy)
Controls the parallelism of the ports of this operator. IsParallelismStrategy.NON_PARALLELIZABLE
by default. If non-parallelizable, all data will be brought to a single processor on the machine/cluster. In general, most operations must be parallelizable for them to be scalable. Some operations (composites
) may consist of parallelizable operations that operator on large pieces of data, followed by non-parallelizable operations that operate on smaller pieces.- Parameters:
strategy
- the strategy to use to determine parallelization.- See Also:
ParallelismStrategy
-
setOutputParallelizable
void setOutputParallelizable(LogicalPort port, boolean parallel)
Sets whether the given output port is parallel. Because the body of an iterative operator is dynamic, iterative operators must call this method to declare to the framework whether the output dataset is parallel. Failure to do so will result in poor performance. Note that if you are using the methodMetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords
, you need not make this declaration.- Parameters:
port
- the output portparallel
- whether the output is parallel
-
setIterationParallelizable
void setIterationParallelizable(LogicalPort port, boolean parallel)
Sets whether the operator will iterate on the given input data in parallel. Iterative operators must call this method to declare to the framework how they intend to process the data of a given port. If this is set to true, data will be staged in a distributed fashion (either throughout the cluster or in multiple local files if running in pseudo-distributed). If set to false, data will be staged on a single file on a single machine. Note that if you are using the methodMetadataUtil#negotiateParallelismBasedOnSourceAssumingParallelizableRecords
, you need not make this declaration.- Parameters:
port
- the port providing a dataset to processparallel
- whether the data is to be processed in parallel
-
setOutputMetadataDynamic
void setOutputMetadataDynamic(LogicalPort port, boolean dynamic)
Indicates that the metadata for the given output port is dynamic. If metadata is dynamic,outputMetadata
must not be specified. The graph is compiled and executed up until this operator. Following execution of this operator, downstream operators can then be compiled. Note that dynamic metadata should be used sparingly as it causes graph compilation errors to be postponed until after a (potentially) long operation.- Parameters:
port
- the output portdynamic
- a value of true means metadata is dynamic.
-
-