Class EngineConfig

java.lang.Object
com.pervasive.datarush.graphs.EngineConfig

public final class EngineConfig extends Object
A collection of engine configuration settings. These collections are immutable, but provide methods for constructing new configurations from an existing one.

A number of instance methods are provided to permit a more fluent and declarative style of specifying configurations. For example,

 public static final EngineConfig myConfig=
     engine().monitor(true).licensePath("/path/to/my/license.file");
 
which defines a configuration with:
  • statistics collection enabled for processes and ports
  • the license file expected at /path/to/my/license.file
  • otherwise default settings
  • Field Details

    • MODULE_CONFIGURATION

      public static final EngineProperty<ModuleConfiguration> MODULE_CONFIGURATION
      Property controlling classpath that is used for launching remote jobs
    • FILE_CONFIGURATION

      public static final EngineProperty<FileMetaConfiguration> FILE_CONFIGURATION
      Property controlling authorization for file access within the graph.
    • CLASSPATH_SPECIFIER

      public static final EngineProperty<ClasspathSpecifier> CLASSPATH_SPECIFIER
      Property controlling classpath that is used for launching remote jobs
    • NETWORK_CONFIGURATION

      public static final EngineProperty<NetworkConfiguration> NETWORK_CONFIGURATION
      Property controlling the socket provider to be used for communication between the client and cluster manager
    • MONITOR

      public static final EngineProperty<Boolean> MONITOR
      Property controlling the collection of execution statistics. By default, they are not collected.
    • SUBGRAPH_HISTORY_SIZE

      public static final EngineProperty<Integer> SUBGRAPH_HISTORY_SIZE
      Property controlling the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
    • STORAGE_MANAGEMENT_PATH

      public static final EngineProperty<Path> STORAGE_MANAGEMENT_PATH
      Property that controls a base path to use for intermediate storage. The default value of empty string will select an appropriate storage location depending on the type of cluster
    • CLUSTER

      public static final EngineProperty<ClusterSpecifier> CLUSTER
      Property controls which cluster to run on
    • PARALLELISM

      public static final EngineProperty<Integer> PARALLELISM
      Property controlling the default parallelism used in graph. By default, this is automatically determined.
    • MIN_PARALLELISM

      public static final EngineProperty<Integer> MIN_PARALLELISM
      Property controlling the default parallelism used in graph. By default, this is automatically determined.
    • MAX_RETRIES

      public static final EngineProperty<Integer> MAX_RETRIES
      Maximum number of retry attempts. The default value of zero implies no retries.
    • DUMPFILE_PATH

      public static final EngineProperty<String> DUMPFILE_PATH
      Defines the directory in which where low-level statistics and plan dumps are to be placed
    • SCHEDULER_QUEUE

      public static final EngineProperty<String> SCHEDULER_QUEUE
      Defines the scheduler queue used for executing jobs
    • EXTENSION_PATHS

      public static final EngineProperty<String> EXTENSION_PATHS
      Defines a list of directories in shared storage containing user extensions
    • ports

      public final EngineConfig.Ports ports
      The ports sub-object
    • sort

      public final EngineConfig.Sort sort
      The sort sub-object.
    • remoteMonitoring

      public final EngineConfig.RemoteMonitoring remoteMonitoring
      The remote monitoring sub-object.
  • Method Details

    • getModuleConfiguration

      public ModuleConfiguration getModuleConfiguration()
      Returns the module configuration to be used by operators in the graph.
      Returns:
      the module configuration
    • moduleConfiguration

      public EngineConfig moduleConfiguration(ModuleConfiguration configuration)
      Specifies the module configuration to be used by operators in the graph
      Parameters:
      configuration - the module configuration
      Returns:
      a new engine config with the specified module configuration
    • getFileConfiguration

      public FileMetaConfiguration getFileConfiguration()
      Returns the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
      Returns:
      the file configuration to be used by operators in the graph.
    • fileConfiguration

      public EngineConfig fileConfiguration(FileMetaConfiguration configuration)
      Specifies the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
      Parameters:
      configuration - the file configuration to be used by operators in the graph.
      Returns:
      a new EngineConfig with the settings modified
    • getClasspathSpecifier

      public ClasspathSpecifier getClasspathSpecifier()
      Returns the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
      Returns:
      the classpath to be used when running a remote job.
    • classpathSpecifier

      public EngineConfig classpathSpecifier(ClasspathSpecifier classpath)
      Sets the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
      Parameters:
      classpath - the classpath to be used when running a remote job.
      Returns:
      a new EngineConfig with the settings modified
    • getNetworkConfiguration

      public NetworkConfiguration getNetworkConfiguration()
      Returns the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
      Returns:
      the network configuration to be used when running a remote job.
    • networkConfiguration

      public EngineConfig networkConfiguration(NetworkConfiguration networkConfiguration)
      Sets the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
      Parameters:
      networkConfiguration - the network configuration to be used when running a remote job.
      Returns:
      a new EngineConfig with the settings modified
    • getAvailableProcessors

      public int getAvailableProcessors()
      Retrieves the configured number of processors available for this engine. This value is normally used to determine the width of horizontal partitions. The default value is the number of processor cores available at run-time.
      Returns:
      the number of processors available for engine use
    • getProperty

      public <T> T getProperty(EngineProperty<T> property)
      Returns the value of an arbitrary property
      Type Parameters:
      T - the property type
      Parameters:
      property - the property
      Returns:
      the value of the property
    • property

      public <T> EngineConfig property(EngineProperty<T> property, T value)
      Constructs a new configuration overriding the specified setting All other settings are left unchanged. Note that clients should typically not use this method; this is for internal use and for custom properties.

      This call does not modify this object.

      Type Parameters:
      T - the type of property
      Parameters:
      property - the property to modify
      value - the new value for that property
      Returns:
      the resulting new configuration
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • applyConfig

      public EngineConfig applyConfig(EngineConfig config)
      Constructs a new configuration by merging the properties of another config with this one. Any non-default properties of the new configuration will override the properties of the existing configuration creating a configuration object.
      Parameters:
      config - the engine configuration containing new properties
      Returns:
      the resulting new configuration
    • engine

      public static EngineConfig engine()
      Returns the default engine configuration
      Returns:
      the default engine configuration
    • monitoredEngine

      public static EngineConfig monitoredEngine()
      Returns a default engine configuration with monitoring enabled.
      Returns:
      an engine configuration with monitoring enabled.
    • engine

      public static EngineConfig engine(Properties properties)
      Creates an EngineConfig from a set of properties. Properties must be built-in engine settings.
      Parameters:
      properties - the properties
      Returns:
      an EngineConfig from a set of properties
    • isMonitored

      public boolean isMonitored()
      Indicates whether run-time statistics are gathered. Returns true if either local monitoring is enabled or remote monitoring is enabled.
      Returns:
      true if run-time statistics will be gathered
    • monitored

      public EngineConfig monitored(boolean enabled)
      Specifies whether performance statistics should be gathered for executed graphs.
      Parameters:
      enabled - indicates whether statistics collection is active
      Returns:
      a new EngineConfig with the settings modified
    • getSubgraphHistorySize

      public int getSubgraphHistorySize()
      Returns the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
      Returns:
      the maximum number of subgraphs to retain when tracking execution history
    • subgraphHistorySize

      public EngineConfig subgraphHistorySize(int size)
      Sets the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
      Parameters:
      size - the maximum number of subgraphs to retain when tracking execution history
      Returns:
      a new EngineConfig with the settings modified
    • getStorageManagementPath

      public Path getStorageManagementPath()
      Returns the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
      Returns:
      the root scratch directory to use
    • storageManagementPath

      public EngineConfig storageManagementPath(Path path)
      Specifies the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
      Parameters:
      path - the storage management path.
      Returns:
      a new EngineConfig with the settings modified
    • defaultStorageManagementPath

      public EngineConfig defaultStorageManagementPath()
      Specifies to use the default storage management path.
      Returns:
      a new EngineConfig with the settings modified
    • getCluster

      public ClusterSpecifier getCluster()
      Returns the cluster specifier. The cluster specifier determines whether we run in distributed or pseudo-distributed mode and, if distributed, specifies the cluster (host and port) on which we are to run.
      Returns:
      the cluster specifier
    • cluster

      public EngineConfig cluster(ClusterSpecifier spec)
      Specifies the cluster on which we are to run
      Parameters:
      spec - the cluster specifier
      Returns:
      a new EngineConfig with the settings modified
    • pseudoDistributed

      public EngineConfig pseudoDistributed()
      Specifies that we are to run in pseudo-distributed mode.
      Returns:
      a new EngineConfig with the settings modified
    • cluster

      public EngineConfig cluster(String url)
      Specifies the cluster that we are to run on.
      Parameters:
      url - the cluster url. For the dr cluster this will be of the form "dr://host:port".
      Returns:
      a new EngineConfig with the settings modified
    • isPseudoDistributed

      public boolean isPseudoDistributed()
      Returns true if the configured ClusterSpecifier is for the pseudo-distributed (i.e. local) cluster.
      Returns:
      true if pseudo-distributed
    • autoParallelize

      public EngineConfig autoParallelize()
      Specifies that the engine should automatically determine the default degree of parallelism for graphs.
      Returns:
      a new EngineConfig with the settings modified
    • getParallelism

      public int getParallelism()
      Returns the raw, un-interpreted value of the parallelism setting
      Returns:
      the raw, un-interpreted value of the parallelism setting
    • parallelism

      public EngineConfig parallelism(int count)
      Specifies that the engine should use the given degree of parallelism for graphs.
      Parameters:
      count - the default degree of parallelism to use
      Returns:
      a new EngineConfig with the settings modified
    • getMinParallelism

      public int getMinParallelism()
      Returns the raw, un-interpreted value of the parallelism setting
      Returns:
      the raw, un-interpreted value of the parallelism setting
    • minParallelism

      public EngineConfig minParallelism(int count)
      Specifies that the engine should use the given degree of parallelism for graphs.
      Parameters:
      count - the default degree of parallelism to use
      Returns:
      a new EngineConfig with the settings modified
    • getMaxRetries

      public int getMaxRetries()
      Maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
      Returns:
      the maximum number of retries
    • maxRetries

      public EngineConfig maxRetries(int maxRetries)
      Specifies the maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
      Parameters:
      maxRetries - the max retries
      Returns:
      a new EngineConfig with the settings modified
    • getDumpfilePath

      public String getDumpfilePath()
      Returns the dump path where low-level statistics and plan dumps are placed.
      Returns:
      the dump path
    • dumpfilePath

      public EngineConfig dumpfilePath(String path)
      Specifies the dump path where low-level statistics and plan dumps are placed.
      Parameters:
      path - the dump path
      Returns:
      a new EngineConfig with the settings modified
    • getSchedulerQueue

      public String getSchedulerQueue()
      Returns the scheduler queue used for scheduling jobs.
      Returns:
      scheduler queue name
    • schedulerQueue

      public EngineConfig schedulerQueue(String queueName)
      Specifies the scheduler queue to use when a job is executed.
      Parameters:
      queueName - name of the queue to use
      Returns:
      a new EngineConfig with the settings modified
    • getRawExtensionPaths

      public String getRawExtensionPaths()
      Returns the comma delimited list of user extension paths.
      Returns:
      user extension directory paths
    • getExtensionPaths

      public String[] getExtensionPaths()
      Return the list of extension paths.
      Returns:
      set of extension paths
    • extensionPaths

      public EngineConfig extensionPaths(String... paths)
      Specifies the list of user extension directory paths to use.
      Parameters:
      paths - set of user extension directory paths
      Returns:
      a new EngineConfig with the settings modified