Class EngineConfig


  • public final class EngineConfig
    extends Object
    A collection of engine configuration settings. These collections are immutable, but provide methods for constructing new configurations from an existing one.

    A number of instance methods are provided to permit a more fluent and declarative style of specifying configurations. For example,

     public static final EngineConfig myConfig=
         engine().monitor(true).licensePath("/path/to/my/license.file");
     
    which defines a configuration with:
    • statistics collection enabled for processes and ports
    • the license file expected at /path/to/my/license.file
    • otherwise default settings
    • Field Detail

      • MODULE_CONFIGURATION

        public static final EngineProperty<ModuleConfiguration> MODULE_CONFIGURATION
        Property controlling classpath that is used for launching remote jobs
      • CLASSPATH_SPECIFIER

        public static final EngineProperty<ClasspathSpecifier> CLASSPATH_SPECIFIER
        Property controlling classpath that is used for launching remote jobs
      • NETWORK_CONFIGURATION

        public static final EngineProperty<NetworkConfiguration> NETWORK_CONFIGURATION
        Property controlling the socket provider to be used for communication between the client and cluster manager
      • MONITOR

        public static final EngineProperty<Boolean> MONITOR
        Property controlling the collection of execution statistics. By default, they are not collected.
      • SUBGRAPH_HISTORY_SIZE

        public static final EngineProperty<Integer> SUBGRAPH_HISTORY_SIZE
        Property controlling the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
      • STORAGE_MANAGEMENT_PATH

        public static final EngineProperty<Path> STORAGE_MANAGEMENT_PATH
        Property that controls a base path to use for intermediate storage. The default value of empty string will select an appropriate storage location depending on the type of cluster
      • PARALLELISM

        public static final EngineProperty<Integer> PARALLELISM
        Property controlling the default parallelism used in graph. By default, this is automatically determined.
      • MIN_PARALLELISM

        public static final EngineProperty<Integer> MIN_PARALLELISM
        Property controlling the default parallelism used in graph. By default, this is automatically determined.
      • MAX_RETRIES

        public static final EngineProperty<Integer> MAX_RETRIES
        Maximum number of retry attempts. The default value of zero implies no retries.
      • DUMPFILE_PATH

        public static final EngineProperty<String> DUMPFILE_PATH
        Defines the directory in which where low-level statistics and plan dumps are to be placed
      • SCHEDULER_QUEUE

        public static final EngineProperty<String> SCHEDULER_QUEUE
        Defines the scheduler queue used for executing jobs
      • EXTENSION_PATHS

        public static final EngineProperty<String> EXTENSION_PATHS
        Defines a list of directories in shared storage containing user extensions
    • Method Detail

      • getModuleConfiguration

        public ModuleConfiguration getModuleConfiguration()
        Returns the module configuration to be used by operators in the graph.
        Returns:
        the module configuration
      • moduleConfiguration

        public EngineConfig moduleConfiguration​(ModuleConfiguration configuration)
        Specifies the module configuration to be used by operators in the graph
        Parameters:
        configuration - the module configuration
        Returns:
        a new engine config with the specified module configuration
      • getFileConfiguration

        public FileMetaConfiguration getFileConfiguration()
        Returns the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
        Returns:
        the file configuration to be used by operators in the graph.
      • fileConfiguration

        public EngineConfig fileConfiguration​(FileMetaConfiguration configuration)
        Specifies the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
        Parameters:
        configuration - the file configuration to be used by operators in the graph.
        Returns:
        a new EngineConfig with the settings modified
      • getClasspathSpecifier

        public ClasspathSpecifier getClasspathSpecifier()
        Returns the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
        Returns:
        the classpath to be used when running a remote job.
      • classpathSpecifier

        public EngineConfig classpathSpecifier​(ClasspathSpecifier classpath)
        Sets the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
        Parameters:
        classpath - the classpath to be used when running a remote job.
        Returns:
        a new EngineConfig with the settings modified
      • getNetworkConfiguration

        public NetworkConfiguration getNetworkConfiguration()
        Returns the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
        Returns:
        the network configuration to be used when running a remote job.
      • networkConfiguration

        public EngineConfig networkConfiguration​(NetworkConfiguration networkConfiguration)
        Sets the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
        Parameters:
        networkConfiguration - the network configuration to be used when running a remote job.
        Returns:
        a new EngineConfig with the settings modified
      • getAvailableProcessors

        public int getAvailableProcessors()
        Retrieves the configured number of processors available for this engine. This value is normally used to determine the width of horizontal partitions. The default value is the number of processor cores available at run-time.
        Returns:
        the number of processors available for engine use
      • getProperty

        public <T> T getProperty​(EngineProperty<T> property)
        Returns the value of an arbitrary property
        Type Parameters:
        T - the property type
        Parameters:
        property - the property
        Returns:
        the value of the property
      • property

        public <T> EngineConfig property​(EngineProperty<T> property,
                                         T value)
        Constructs a new configuration overriding the specified setting All other settings are left unchanged. Note that clients should typically not use this method; this is for internal use and for custom properties.

        This call does not modify this object.

        Type Parameters:
        T - the type of property
        Parameters:
        property - the property to modify
        value - the new value for that property
        Returns:
        the resulting new configuration
      • applyConfig

        public EngineConfig applyConfig​(EngineConfig config)
        Constructs a new configuration by merging the properties of another config with this one. Any non-default properties of the new configuration will override the properties of the existing configuration creating a configuration object.
        Parameters:
        config - the engine configuration containing new properties
        Returns:
        the resulting new configuration
      • engine

        public static EngineConfig engine()
        Returns the default engine configuration
        Returns:
        the default engine configuration
      • monitoredEngine

        public static EngineConfig monitoredEngine()
        Returns a default engine configuration with monitoring enabled.
        Returns:
        an engine configuration with monitoring enabled.
      • engine

        public static EngineConfig engine​(Properties properties)
        Creates an EngineConfig from a set of properties. Properties must be built-in engine settings.
        Parameters:
        properties - the properties
        Returns:
        an EngineConfig from a set of properties
      • isMonitored

        public boolean isMonitored()
        Indicates whether run-time statistics are gathered. Returns true if either local monitoring is enabled or remote monitoring is enabled.
        Returns:
        true if run-time statistics will be gathered
      • monitored

        public EngineConfig monitored​(boolean enabled)
        Specifies whether performance statistics should be gathered for executed graphs.
        Parameters:
        enabled - indicates whether statistics collection is active
        Returns:
        a new EngineConfig with the settings modified
      • getSubgraphHistorySize

        public int getSubgraphHistorySize()
        Returns the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
        Returns:
        the maximum number of subgraphs to retain when tracking execution history
      • subgraphHistorySize

        public EngineConfig subgraphHistorySize​(int size)
        Sets the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
        Parameters:
        size - the maximum number of subgraphs to retain when tracking execution history
        Returns:
        a new EngineConfig with the settings modified
      • getStorageManagementPath

        public Path getStorageManagementPath()
        Returns the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
        Returns:
        the root scratch directory to use
      • storageManagementPath

        public EngineConfig storageManagementPath​(Path path)
        Specifies the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
        Parameters:
        path - the storage management path.
        Returns:
        a new EngineConfig with the settings modified
      • defaultStorageManagementPath

        public EngineConfig defaultStorageManagementPath()
        Specifies to use the default storage management path.
        Returns:
        a new EngineConfig with the settings modified
      • getCluster

        public ClusterSpecifier getCluster()
        Returns the cluster specifier. The cluster specifier determines whether we run in distributed or pseudo-distributed mode and, if distributed, specifies the cluster (host and port) on which we are to run.
        Returns:
        the cluster specifier
      • cluster

        public EngineConfig cluster​(ClusterSpecifier spec)
        Specifies the cluster on which we are to run
        Parameters:
        spec - the cluster specifier
        Returns:
        a new EngineConfig with the settings modified
      • pseudoDistributed

        public EngineConfig pseudoDistributed()
        Specifies that we are to run in pseudo-distributed mode.
        Returns:
        a new EngineConfig with the settings modified
      • cluster

        public EngineConfig cluster​(String url)
        Specifies the cluster that we are to run on.
        Parameters:
        url - the cluster url. For the dr cluster this will be of the form "dr://host:port".
        Returns:
        a new EngineConfig with the settings modified
      • isPseudoDistributed

        public boolean isPseudoDistributed()
        Returns true if the configured ClusterSpecifier is for the pseudo-distributed (i.e. local) cluster.
        Returns:
        true if pseudo-distributed
      • autoParallelize

        public EngineConfig autoParallelize()
        Specifies that the engine should automatically determine the default degree of parallelism for graphs.
        Returns:
        a new EngineConfig with the settings modified
      • getParallelism

        public int getParallelism()
        Returns the raw, un-interpreted value of the parallelism setting
        Returns:
        the raw, un-interpreted value of the parallelism setting
      • parallelism

        public EngineConfig parallelism​(int count)
        Specifies that the engine should use the given degree of parallelism for graphs.
        Parameters:
        count - the default degree of parallelism to use
        Returns:
        a new EngineConfig with the settings modified
      • getMinParallelism

        public int getMinParallelism()
        Returns the raw, un-interpreted value of the parallelism setting
        Returns:
        the raw, un-interpreted value of the parallelism setting
      • minParallelism

        public EngineConfig minParallelism​(int count)
        Specifies that the engine should use the given degree of parallelism for graphs.
        Parameters:
        count - the default degree of parallelism to use
        Returns:
        a new EngineConfig with the settings modified
      • getMaxRetries

        public int getMaxRetries()
        Maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
        Returns:
        the maximum number of retries
      • maxRetries

        public EngineConfig maxRetries​(int maxRetries)
        Specifies the maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
        Parameters:
        maxRetries - the max retries
        Returns:
        a new EngineConfig with the settings modified
      • getDumpfilePath

        public String getDumpfilePath()
        Returns the dump path where low-level statistics and plan dumps are placed.
        Returns:
        the dump path
      • dumpfilePath

        public EngineConfig dumpfilePath​(String path)
        Specifies the dump path where low-level statistics and plan dumps are placed.
        Parameters:
        path - the dump path
        Returns:
        a new EngineConfig with the settings modified
      • getSchedulerQueue

        public String getSchedulerQueue()
        Returns the scheduler queue used for scheduling jobs.
        Returns:
        scheduler queue name
      • schedulerQueue

        public EngineConfig schedulerQueue​(String queueName)
        Specifies the scheduler queue to use when a job is executed.
        Parameters:
        queueName - name of the queue to use
        Returns:
        a new EngineConfig with the settings modified
      • getRawExtensionPaths

        public String getRawExtensionPaths()
        Returns the comma delimited list of user extension paths.
        Returns:
        user extension directory paths
      • getExtensionPaths

        public String[] getExtensionPaths()
        Return the list of extension paths.
        Returns:
        set of extension paths
      • extensionPaths

        public EngineConfig extensionPaths​(String... paths)
        Specifies the list of user extension directory paths to use.
        Parameters:
        paths - set of user extension directory paths
        Returns:
        a new EngineConfig with the settings modified