java.lang.Object

com.pervasive.datarush.graphs.EngineConfig

public final class EngineConfig extends Object

A collection of engine configuration settings. These collections are immutable, but provide methods for constructing new configurations from an existing one.

A number of instance methods are provided to permit a more fluent and declarative style of specifying configurations. For example,

 public static final EngineConfig myConfig=
     engine().monitor(true).licensePath("/path/to/my/license.file");

which defines a configuration with:

statistics collection enabled for processes and ports
the license file expected at /path/to/my/license.file
otherwise default settings

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final class

EngineConfig.Ports

Nested class containing settings specific to ports.

static final class

EngineConfig.RemoteMonitoring

Settings in this object determine settings for remote monitoring

static final class

EngineConfig.Sort

Settings in this object determine default tuning for the Sort operator.
Field Summary

Fields

Modifier and Type

Field

Description

static final EngineProperty<ClasspathSpecifier>

CLASSPATH_SPECIFIER

Property controlling classpath that is used for launching remote jobs

static final EngineProperty<ClusterSpecifier>

CLUSTER

Property controls which cluster to run on

static final EngineProperty<String>

DUMPFILE_PATH

Defines the directory in which where low-level statistics and plan dumps are to be placed

static final EngineProperty<String>

EXTENSION_PATHS

Defines a list of directories in shared storage containing user extensions

static final EngineProperty<FileMetaConfiguration>

FILE_CONFIGURATION

Property controlling authorization for file access within the graph.

static final EngineProperty<Integer>

MAX_RETRIES

Maximum number of retry attempts.

static final EngineProperty<Integer>

MIN_PARALLELISM

Property controlling the default parallelism used in graph.

static final EngineProperty<ModuleConfiguration>

MODULE_CONFIGURATION

Property controlling classpath that is used for launching remote jobs

static final EngineProperty<Boolean>

MONITOR

Property controlling the collection of execution statistics.

static final EngineProperty<NetworkConfiguration>

NETWORK_CONFIGURATION

Property controlling the socket provider to be used for communication between the client and cluster manager

static final EngineProperty<Integer>

PARALLELISM

Property controlling the default parallelism used in graph.

final EngineConfig.Ports

ports

The ports sub-object

final EngineConfig.RemoteMonitoring

remoteMonitoring

The remote monitoring sub-object.

static final EngineProperty<String>

SCHEDULER_QUEUE

Defines the scheduler queue used for executing jobs

final EngineConfig.Sort

sort

The sort sub-object.

static final EngineProperty<Path>

STORAGE_MANAGEMENT_PATH

Property that controls a base path to use for intermediate storage.

static final EngineProperty<Integer>

SUBGRAPH_HISTORY_SIZE

Property controlling the maximum number of subgraphs to retain when tracking execution history.
Method Summary

Modifier and Type

Method

Description

EngineConfig

applyConfig(EngineConfig config)

Constructs a new configuration by merging the properties of another config with this one.

EngineConfig

autoParallelize()

Specifies that the engine should automatically determine the default degree of parallelism for graphs.

EngineConfig

classpathSpecifier(ClasspathSpecifier classpath)

Sets the classpath to be used in the case of a remote job.

EngineConfig

cluster(ClusterSpecifier spec)

Specifies the cluster on which we are to run

EngineConfig

cluster(String url)

Specifies the cluster that we are to run on.

EngineConfig

defaultStorageManagementPath()

Specifies to use the default storage management path.

EngineConfig

dumpfilePath(String path)

Specifies the dump path where low-level statistics and plan dumps are placed.

static EngineConfig

engine()

Returns the default engine configuration

static EngineConfig

engine(Properties properties)

Creates an EngineConfig from a set of properties.

EngineConfig

extensionPaths(String... paths)

Specifies the list of user extension directory paths to use.

EngineConfig

fileConfiguration(FileMetaConfiguration configuration)

Specifies the file configuration to be used by operators in the graph.

int

getAvailableProcessors()

Retrieves the configured number of processors available for this engine.

ClasspathSpecifier

getClasspathSpecifier()

Returns the classpath to be used in the case of a remote job.

ClusterSpecifier

getCluster()

Returns the cluster specifier.

String

getDumpfilePath()

Returns the dump path where low-level statistics and plan dumps are placed.

String[]

getExtensionPaths()

Return the list of extension paths.

FileMetaConfiguration

getFileConfiguration()

Returns the file configuration to be used by operators in the graph.

int

getMaxRetries()

Maximum number of retries.

int

getMinParallelism()

Returns the raw, un-interpreted value of the parallelism setting

ModuleConfiguration

getModuleConfiguration()

Returns the module configuration to be used by operators in the graph.

NetworkConfiguration

getNetworkConfiguration()

Returns the network configuration to be used for communication between the client and cluster manager.

int

getParallelism()

Returns the raw, un-interpreted value of the parallelism setting

<T> T

getProperty(EngineProperty<T> property)

Returns the value of an arbitrary property

String

getRawExtensionPaths()

Returns the comma delimited list of user extension paths.

String

getSchedulerQueue()

Returns the scheduler queue used for scheduling jobs.

Path

getStorageManagementPath()

Returns the root scratch directory to use.

int

getSubgraphHistorySize()

Returns the maximum number of subgraphs to retain when tracking execution history.

boolean

isMonitored()

Indicates whether run-time statistics are gathered.

boolean

isPseudoDistributed()

Returns true if the configured ClusterSpecifier is for the pseudo-distributed (i.e.

EngineConfig

maxRetries(int maxRetries)

Specifies the maximum number of retries.

EngineConfig

minParallelism(int count)

Specifies that the engine should use the given degree of parallelism for graphs.

EngineConfig

moduleConfiguration(ModuleConfiguration configuration)

Specifies the module configuration to be used by operators in the graph

EngineConfig

monitored(boolean enabled)

Specifies whether performance statistics should be gathered for executed graphs.

static EngineConfig

monitoredEngine()

Returns a default engine configuration with monitoring enabled.

EngineConfig

networkConfiguration(NetworkConfiguration networkConfiguration)

Sets the network configuration to be used for communication between the client and cluster manager.

EngineConfig

parallelism(int count)

Specifies that the engine should use the given degree of parallelism for graphs.

<T> EngineConfig

property(EngineProperty<T> property, T value)

Constructs a new configuration overriding the specified setting All other settings are left unchanged.

EngineConfig

pseudoDistributed()

Specifies that we are to run in pseudo-distributed mode.

EngineConfig

schedulerQueue(String queueName)

Specifies the scheduler queue to use when a job is executed.

EngineConfig

storageManagementPath(Path path)

Specifies the root scratch directory to use.

EngineConfig

subgraphHistorySize(int size)

Sets the maximum number of subgraphs to retain when tracking execution history.

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- MODULE_CONFIGURATION
  
  public static final EngineProperty<ModuleConfiguration> MODULE_CONFIGURATION
  
  Property controlling classpath that is used for launching remote jobs
- FILE_CONFIGURATION
  
  public static final EngineProperty<FileMetaConfiguration> FILE_CONFIGURATION
  
  Property controlling authorization for file access within the graph.
- CLASSPATH_SPECIFIER
  
  public static final EngineProperty<ClasspathSpecifier> CLASSPATH_SPECIFIER
  
  Property controlling classpath that is used for launching remote jobs
- NETWORK_CONFIGURATION
  
  public static final EngineProperty<NetworkConfiguration> NETWORK_CONFIGURATION
  
  Property controlling the socket provider to be used for communication between the client and cluster manager
- MONITOR
  
  public static final EngineProperty<Boolean> MONITOR
  
  Property controlling the collection of execution statistics. By default, they are not collected.
- SUBGRAPH_HISTORY_SIZE
  
  public static final EngineProperty<Integer> SUBGRAPH_HISTORY_SIZE
  
  Property controlling the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
- STORAGE_MANAGEMENT_PATH
  
  public static final EngineProperty<Path> STORAGE_MANAGEMENT_PATH
  
  Property that controls a base path to use for intermediate storage. The default value of empty string will select an appropriate storage location depending on the type of cluster
- CLUSTER
  
  public static final EngineProperty<ClusterSpecifier> CLUSTER
  
  Property controls which cluster to run on
- PARALLELISM
  
  public static final EngineProperty<Integer> PARALLELISM
  
  Property controlling the default parallelism used in graph. By default, this is automatically determined.
- MIN_PARALLELISM
  
  public static final EngineProperty<Integer> MIN_PARALLELISM
  
  Property controlling the default parallelism used in graph. By default, this is automatically determined.
- MAX_RETRIES
  
  public static final EngineProperty<Integer> MAX_RETRIES
  
  Maximum number of retry attempts. The default value of zero implies no retries.
- DUMPFILE_PATH
  
  public static final EngineProperty<String> DUMPFILE_PATH
  
  Defines the directory in which where low-level statistics and plan dumps are to be placed
- SCHEDULER_QUEUE
  
  public static final EngineProperty<String> SCHEDULER_QUEUE
  
  Defines the scheduler queue used for executing jobs
- EXTENSION_PATHS
  
  public static final EngineProperty<String> EXTENSION_PATHS
  
  Defines a list of directories in shared storage containing user extensions
- ports
  
  public final EngineConfig.Ports ports
  
  The ports sub-object
- sort
  
  public final EngineConfig.Sort sort
  
  The sort sub-object.
- remoteMonitoring
  
  public final EngineConfig.RemoteMonitoring remoteMonitoring
  
  The remote monitoring sub-object.
Method Details
- getModuleConfiguration
  
  public ModuleConfiguration getModuleConfiguration()
  
  Returns the module configuration to be used by operators in the graph.
  
  Returns:
  
  the module configuration
- moduleConfiguration
  
  public EngineConfig moduleConfiguration(ModuleConfiguration configuration)
  
  Specifies the module configuration to be used by operators in the graph
  
  Parameters:
  
  configuration - the module configuration
  
  Returns:
  
  a new engine config with the specified module configuration
- getFileConfiguration
  
  public FileMetaConfiguration getFileConfiguration()
  
  Returns the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
  
  Returns:
  
  the file configuration to be used by operators in the graph.
- fileConfiguration
  
  public EngineConfig fileConfiguration(FileMetaConfiguration configuration)
  
  Specifies the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph
  
  Parameters:
  
  configuration - the file configuration to be used by operators in the graph.
  
  Returns:
  
  a new EngineConfig with the settings modified
- getClasspathSpecifier
  
  public ClasspathSpecifier getClasspathSpecifier()
  
  Returns the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
  
  Returns:
  
  the classpath to be used when running a remote job.
- classpathSpecifier
  
  public EngineConfig classpathSpecifier(ClasspathSpecifier classpath)
  
  Sets the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.
  
  Parameters:
  
  classpath - the classpath to be used when running a remote job.
  
  Returns:
  
  a new EngineConfig with the settings modified
- getNetworkConfiguration
  
  public NetworkConfiguration getNetworkConfiguration()
  
  Returns the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
  
  Returns:
  
  the network configuration to be used when running a remote job.
- networkConfiguration
  
  public EngineConfig networkConfiguration(NetworkConfiguration networkConfiguration)
  
  Sets the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.
  
  Parameters:
  
  networkConfiguration - the network configuration to be used when running a remote job.
  
  Returns:
  
  a new EngineConfig with the settings modified
- getAvailableProcessors
  
  public int getAvailableProcessors()
  
  Retrieves the configured number of processors available for this engine. This value is normally used to determine the width of horizontal partitions. The default value is the number of processor cores available at run-time.
  
  Returns:
  
  the number of processors available for engine use
- getProperty
  
  public <T> T getProperty(EngineProperty<T> property)
  
  Returns the value of an arbitrary property
  
  Type Parameters:
  
  T - the property type
  
  Parameters:
  
  property - the property
  
  Returns:
  
  the value of the property
- property
  
  public <T> EngineConfig property(EngineProperty<T> property, T value)
  
  Constructs a new configuration overriding the specified setting All other settings are left unchanged. Note that clients should typically not use this method; this is for internal use and for custom properties.
  This call does not modify this object.
  
  Type Parameters:
  
  T - the type of property
  
  Parameters:
  
  property - the property to modify
  
  value - the new value for that property
  
  Returns:
  
  the resulting new configuration
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- applyConfig
  
  public EngineConfig applyConfig(EngineConfig config)
  
  Constructs a new configuration by merging the properties of another config with this one. Any non-default properties of the new configuration will override the properties of the existing configuration creating a configuration object.
  
  Parameters:
  
  config - the engine configuration containing new properties
  
  Returns:
  
  the resulting new configuration
- engine
  
  public static EngineConfig engine()
  
  Returns the default engine configuration
  
  Returns:
  
  the default engine configuration
- monitoredEngine
  
  public static EngineConfig monitoredEngine()
  
  Returns a default engine configuration with monitoring enabled.
  
  Returns:
  
  an engine configuration with monitoring enabled.
- engine
  
  public static EngineConfig engine(Properties properties)
  
  Creates an EngineConfig from a set of properties. Properties must be built-in engine settings.
  
  Parameters:
  
  properties - the properties
  
  Returns:
  
  an EngineConfig from a set of properties
- isMonitored
  
  public boolean isMonitored()
  
  Indicates whether run-time statistics are gathered. Returns true if either local monitoring is enabled or remote monitoring is enabled.
  
  Returns:
  
  true if run-time statistics will be gathered
- monitored
  
  public EngineConfig monitored(boolean enabled)
  
  Specifies whether performance statistics should be gathered for executed graphs.
  
  Parameters:
  
  enabled - indicates whether statistics collection is active
  
  Returns:
  
  a new EngineConfig with the settings modified
- getSubgraphHistorySize
  
  public int getSubgraphHistorySize()
  
  Returns the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
  
  Returns:
  
  the maximum number of subgraphs to retain when tracking execution history
- subgraphHistorySize
  
  public EngineConfig subgraphHistorySize(int size)
  
  Sets the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
  
  Parameters:
  
  size - the maximum number of subgraphs to retain when tracking execution history
  
  Returns:
  
  a new EngineConfig with the settings modified
- getStorageManagementPath
  
  public Path getStorageManagementPath()
  
  Returns the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
  
  Returns:
  
  the root scratch directory to use
- storageManagementPath
  
  public EngineConfig storageManagementPath(Path path)
  
  Specifies the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.
  
  Parameters:
  
  path - the storage management path.
  
  Returns:
  
  a new EngineConfig with the settings modified
- defaultStorageManagementPath
  
  public EngineConfig defaultStorageManagementPath()
  
  Specifies to use the default storage management path.
  
  Returns:
  
  a new EngineConfig with the settings modified
- getCluster
  
  public ClusterSpecifier getCluster()
  
  Returns the cluster specifier. The cluster specifier determines whether we run in distributed or pseudo-distributed mode and, if distributed, specifies the cluster (host and port) on which we are to run.
  
  Returns:
  
  the cluster specifier
- cluster
  
  public EngineConfig cluster(ClusterSpecifier spec)
  
  Specifies the cluster on which we are to run
  
  Parameters:
  
  spec - the cluster specifier
  
  Returns:
  
  a new EngineConfig with the settings modified
- pseudoDistributed
  
  public EngineConfig pseudoDistributed()
  
  Specifies that we are to run in pseudo-distributed mode.
  
  Returns:
  
  a new EngineConfig with the settings modified
- cluster
  
  public EngineConfig cluster(String url)
  
  Specifies the cluster that we are to run on.
  
  Parameters:
  
  url - the cluster url. For the dr cluster this will be of the form "dr://host:port".
  
  Returns:
  
  a new EngineConfig with the settings modified
- isPseudoDistributed
  
  public boolean isPseudoDistributed()
  
  Returns true if the configured ClusterSpecifier is for the pseudo-distributed (i.e. local) cluster.
  
  Returns:
  
  true if pseudo-distributed
- autoParallelize
  
  public EngineConfig autoParallelize()
  
  Specifies that the engine should automatically determine the default degree of parallelism for graphs.
  
  Returns:
  
  a new EngineConfig with the settings modified
- getParallelism
  
  public int getParallelism()
  
  Returns the raw, un-interpreted value of the parallelism setting
  
  Returns:
  
  the raw, un-interpreted value of the parallelism setting
- parallelism
  
  public EngineConfig parallelism(int count)
  
  Specifies that the engine should use the given degree of parallelism for graphs.
  
  Parameters:
  
  count - the default degree of parallelism to use
  
  Returns:
  
  a new EngineConfig with the settings modified
- getMinParallelism
  
  public int getMinParallelism()
  
  Returns the raw, un-interpreted value of the parallelism setting
  
  Returns:
  
  the raw, un-interpreted value of the parallelism setting
- minParallelism
  
  public EngineConfig minParallelism(int count)
  
  Specifies that the engine should use the given degree of parallelism for graphs.
  
  Parameters:
  
  count - the default degree of parallelism to use
  
  Returns:
  
  a new EngineConfig with the settings modified
- getMaxRetries
  
  public int getMaxRetries()
  
  Maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
  
  Returns:
  
  the maximum number of retries
- maxRetries
  
  public EngineConfig maxRetries(int maxRetries)
  
  Specifies the maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.
  
  Parameters:
  
  maxRetries - the max retries
  
  Returns:
  
  a new EngineConfig with the settings modified
- getDumpfilePath
  
  public String getDumpfilePath()
  
  Returns the dump path where low-level statistics and plan dumps are placed.
  
  Returns:
  
  the dump path
- dumpfilePath
  
  public EngineConfig dumpfilePath(String path)
  
  Specifies the dump path where low-level statistics and plan dumps are placed.
  
  Parameters:
  
  path - the dump path
  
  Returns:
  
  a new EngineConfig with the settings modified
- getSchedulerQueue
  
  public String getSchedulerQueue()
  
  Returns the scheduler queue used for scheduling jobs.
  
  Returns:
  
  scheduler queue name
- schedulerQueue
  
  public EngineConfig schedulerQueue(String queueName)
  
  Specifies the scheduler queue to use when a job is executed.
  
  Parameters:
  
  queueName - name of the queue to use
  
  Returns:
  
  a new EngineConfig with the settings modified
- getRawExtensionPaths
  
  public String getRawExtensionPaths()
  
  Returns the comma delimited list of user extension paths.
  
  Returns:
  
  user extension directory paths
- getExtensionPaths
  
  public String[] getExtensionPaths()
  
  Return the list of extension paths.
  
  Returns:
  
  set of extension paths
- extensionPaths
  
  public EngineConfig extensionPaths(String... paths)
  
  Specifies the list of user extension directory paths to use.
  
  Parameters:
  
  paths - set of user extension directory paths
  
  Returns:
  
  a new EngineConfig with the settings modified

Class EngineConfig

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MODULE_CONFIGURATION

FILE_CONFIGURATION

CLASSPATH_SPECIFIER

NETWORK_CONFIGURATION

MONITOR

SUBGRAPH_HISTORY_SIZE

STORAGE_MANAGEMENT_PATH

CLUSTER

PARALLELISM

MIN_PARALLELISM

MAX_RETRIES

DUMPFILE_PATH

SCHEDULER_QUEUE

EXTENSION_PATHS

ports

sort

remoteMonitoring

Method Details

getModuleConfiguration

moduleConfiguration

getFileConfiguration

fileConfiguration

getClasspathSpecifier

classpathSpecifier

getNetworkConfiguration

networkConfiguration

getAvailableProcessors

getProperty

property

toString

applyConfig

engine

monitoredEngine

engine

isMonitored

monitored

getSubgraphHistorySize

subgraphHistorySize

getStorageManagementPath

storageManagementPath

defaultStorageManagementPath

getCluster

cluster

pseudoDistributed

cluster

isPseudoDistributed

autoParallelize

getParallelism

parallelism

getMinParallelism

minParallelism

getMaxRetries

maxRetries

getDumpfilePath

dumpfilePath

getSchedulerQueue

schedulerQueue

getRawExtensionPaths

getExtensionPaths

extensionPaths