java.lang.Object
com.pervasive.datarush.graphs.EngineConfig
A collection of engine configuration settings.
These collections are immutable, but provide methods for
constructing new configurations from an existing one.
A number of instance methods are provided to permit a more fluent and declarative style of specifying configurations. For example,
public static final EngineConfig myConfig=
engine().monitor(true).licensePath("/path/to/my/license.file");
which defines a configuration with:
- statistics collection enabled for processes and ports
- the license file expected at
/path/to/my/license.file - otherwise default settings
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classNested class containing settings specific to ports.static final classSettings in this object determine settings for remote monitoringstatic final classSettings in this object determine default tuning for theSortoperator. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final EngineProperty<ClasspathSpecifier>Property controlling classpath that is used for launching remote jobsstatic final EngineProperty<ClusterSpecifier>Property controls which cluster to run onstatic final EngineProperty<String>Defines the directory in which where low-level statistics and plan dumps are to be placedstatic final EngineProperty<String>Defines a list of directories in shared storage containing user extensionsstatic final EngineProperty<FileMetaConfiguration>Property controlling authorization for file access within the graph.static final EngineProperty<Integer>Maximum number of retry attempts.static final EngineProperty<Integer>Property controlling the default parallelism used in graph.static final EngineProperty<ModuleConfiguration>Property controlling classpath that is used for launching remote jobsstatic final EngineProperty<Boolean>Property controlling the collection of execution statistics.static final EngineProperty<NetworkConfiguration>Property controlling the socket provider to be used for communication between the client and cluster managerstatic final EngineProperty<Integer>Property controlling the default parallelism used in graph.final EngineConfig.PortsThe ports sub-objectThe remote monitoring sub-object.static final EngineProperty<String>Defines the scheduler queue used for executing jobsfinal EngineConfig.SortThe sort sub-object.static final EngineProperty<Path>Property that controls a base path to use for intermediate storage.static final EngineProperty<Integer>Property controlling the maximum number of subgraphs to retain when tracking execution history. -
Method Summary
Modifier and TypeMethodDescriptionapplyConfig(EngineConfig config) Constructs a new configuration by merging the properties of another config with this one.Specifies that the engine should automatically determine the default degree of parallelism for graphs.classpathSpecifier(ClasspathSpecifier classpath) Sets the classpath to be used in the case of a remote job.cluster(ClusterSpecifier spec) Specifies the cluster on which we are to runSpecifies the cluster that we are to run on.Specifies to use the default storage management path.dumpfilePath(String path) Specifies the dump path where low-level statistics and plan dumps are placed.static EngineConfigengine()Returns the default engine configurationstatic EngineConfigengine(Properties properties) Creates an EngineConfig from a set of properties.extensionPaths(String... paths) Specifies the list of user extension directory paths to use.fileConfiguration(FileMetaConfiguration configuration) Specifies the file configuration to be used by operators in the graph.intRetrieves the configured number of processors available for this engine.Returns the classpath to be used in the case of a remote job.Returns the cluster specifier.Returns the dump path where low-level statistics and plan dumps are placed.String[]Return the list of extension paths.Returns the file configuration to be used by operators in the graph.intMaximum number of retries.intReturns the raw, un-interpreted value of the parallelism settingReturns the module configuration to be used by operators in the graph.Returns the network configuration to be used for communication between the client and cluster manager.intReturns the raw, un-interpreted value of the parallelism setting<T> TgetProperty(EngineProperty<T> property) Returns the value of an arbitrary propertyReturns the comma delimited list of user extension paths.Returns the scheduler queue used for scheduling jobs.Returns the root scratch directory to use.intReturns the maximum number of subgraphs to retain when tracking execution history.booleanIndicates whether run-time statistics are gathered.booleanReturns true if the configuredClusterSpecifieris for the pseudo-distributed (i.e.maxRetries(int maxRetries) Specifies the maximum number of retries.minParallelism(int count) Specifies that the engine should use the given degree of parallelism for graphs.moduleConfiguration(ModuleConfiguration configuration) Specifies the module configuration to be used by operators in the graphmonitored(boolean enabled) Specifies whether performance statistics should be gathered for executed graphs.static EngineConfigReturns a default engine configuration with monitoring enabled.networkConfiguration(NetworkConfiguration networkConfiguration) Sets the network configuration to be used for communication between the client and cluster manager.parallelism(int count) Specifies that the engine should use the given degree of parallelism for graphs.<T> EngineConfigproperty(EngineProperty<T> property, T value) Constructs a new configuration overriding the specified setting All other settings are left unchanged.Specifies that we are to run in pseudo-distributed mode.schedulerQueue(String queueName) Specifies the scheduler queue to use when a job is executed.storageManagementPath(Path path) Specifies the root scratch directory to use.subgraphHistorySize(int size) Sets the maximum number of subgraphs to retain when tracking execution history.toString()
-
Field Details
-
MODULE_CONFIGURATION
Property controlling classpath that is used for launching remote jobs -
FILE_CONFIGURATION
Property controlling authorization for file access within the graph. -
CLASSPATH_SPECIFIER
Property controlling classpath that is used for launching remote jobs -
NETWORK_CONFIGURATION
Property controlling the socket provider to be used for communication between the client and cluster manager -
MONITOR
Property controlling the collection of execution statistics. By default, they are not collected. -
SUBGRAPH_HISTORY_SIZE
Property controlling the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator. -
STORAGE_MANAGEMENT_PATH
Property that controls a base path to use for intermediate storage. The default value of empty string will select an appropriate storage location depending on the type of cluster -
CLUSTER
Property controls which cluster to run on -
PARALLELISM
Property controlling the default parallelism used in graph. By default, this is automatically determined. -
MIN_PARALLELISM
Property controlling the default parallelism used in graph. By default, this is automatically determined. -
MAX_RETRIES
Maximum number of retry attempts. The default value of zero implies no retries. -
DUMPFILE_PATH
Defines the directory in which where low-level statistics and plan dumps are to be placed -
SCHEDULER_QUEUE
Defines the scheduler queue used for executing jobs -
EXTENSION_PATHS
Defines a list of directories in shared storage containing user extensions -
ports
The ports sub-object -
sort
The sort sub-object. -
remoteMonitoring
The remote monitoring sub-object.
-
-
Method Details
-
getModuleConfiguration
Returns the module configuration to be used by operators in the graph.- Returns:
- the module configuration
-
moduleConfiguration
Specifies the module configuration to be used by operators in the graph- Parameters:
configuration- the module configuration- Returns:
- a new engine config with the specified module configuration
-
getFileConfiguration
Returns the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph- Returns:
- the file configuration to be used by operators in the graph.
-
fileConfiguration
Specifies the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph- Parameters:
configuration- the file configuration to be used by operators in the graph.- Returns:
- a new
EngineConfigwith the settings modified
-
getClasspathSpecifier
Returns the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.- Returns:
- the classpath to be used when running a remote job.
-
classpathSpecifier
Sets the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.- Parameters:
classpath- the classpath to be used when running a remote job.- Returns:
- a new
EngineConfigwith the settings modified
-
getNetworkConfiguration
Returns the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.- Returns:
- the network configuration to be used when running a remote job.
-
networkConfiguration
Sets the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.- Parameters:
networkConfiguration- the network configuration to be used when running a remote job.- Returns:
- a new
EngineConfigwith the settings modified
-
getAvailableProcessors
public int getAvailableProcessors()Retrieves the configured number of processors available for this engine. This value is normally used to determine the width of horizontal partitions. The default value is the number of processor cores available at run-time.- Returns:
- the number of processors available for engine use
-
getProperty
Returns the value of an arbitrary property- Type Parameters:
T- the property type- Parameters:
property- the property- Returns:
- the value of the property
-
property
Constructs a new configuration overriding the specified setting All other settings are left unchanged. Note that clients should typically not use this method; this is for internal use and for custom properties.This call does not modify this object.
- Type Parameters:
T- the type of property- Parameters:
property- the property to modifyvalue- the new value for that property- Returns:
- the resulting new configuration
-
toString
-
applyConfig
Constructs a new configuration by merging the properties of another config with this one. Any non-default properties of the new configuration will override the properties of the existing configuration creating a configuration object.- Parameters:
config- the engine configuration containing new properties- Returns:
- the resulting new configuration
-
engine
Returns the default engine configuration- Returns:
- the default engine configuration
-
monitoredEngine
Returns a default engine configuration with monitoring enabled.- Returns:
- an engine configuration with monitoring enabled.
-
engine
Creates an EngineConfig from a set of properties. Properties must be built-in engine settings.- Parameters:
properties- the properties- Returns:
- an EngineConfig from a set of properties
-
isMonitored
public boolean isMonitored()Indicates whether run-time statistics are gathered. Returns true if either local monitoring is enabled or remote monitoring is enabled.- Returns:
trueif run-time statistics will be gathered
-
monitored
Specifies whether performance statistics should be gathered for executed graphs.- Parameters:
enabled- indicates whether statistics collection is active- Returns:
- a new
EngineConfigwith the settings modified
-
getSubgraphHistorySize
public int getSubgraphHistorySize()Returns the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.- Returns:
- the maximum number of subgraphs to retain when tracking execution history
-
subgraphHistorySize
Sets the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.- Parameters:
size- the maximum number of subgraphs to retain when tracking execution history- Returns:
- a new
EngineConfigwith the settings modified
-
getStorageManagementPath
Returns the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.- Returns:
- the root scratch directory to use
-
storageManagementPath
Specifies the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.- Parameters:
path- the storage management path.- Returns:
- a new
EngineConfigwith the settings modified
-
defaultStorageManagementPath
Specifies to use the default storage management path.- Returns:
- a new
EngineConfigwith the settings modified
-
getCluster
Returns the cluster specifier. The cluster specifier determines whether we run in distributed or pseudo-distributed mode and, if distributed, specifies the cluster (host and port) on which we are to run.- Returns:
- the cluster specifier
-
cluster
Specifies the cluster on which we are to run- Parameters:
spec- the cluster specifier- Returns:
- a new
EngineConfigwith the settings modified
-
pseudoDistributed
Specifies that we are to run in pseudo-distributed mode.- Returns:
- a new
EngineConfigwith the settings modified
-
cluster
Specifies the cluster that we are to run on.- Parameters:
url- the cluster url. For the dr cluster this will be of the form "dr://host:port".- Returns:
- a new
EngineConfigwith the settings modified
-
isPseudoDistributed
public boolean isPseudoDistributed()Returns true if the configuredClusterSpecifieris for the pseudo-distributed (i.e. local) cluster.- Returns:
- true if pseudo-distributed
-
autoParallelize
Specifies that the engine should automatically determine the default degree of parallelism for graphs.- Returns:
- a new
EngineConfigwith the settings modified
-
getParallelism
public int getParallelism()Returns the raw, un-interpreted value of the parallelism setting- Returns:
- the raw, un-interpreted value of the parallelism setting
-
parallelism
Specifies that the engine should use the given degree of parallelism for graphs.- Parameters:
count- the default degree of parallelism to use- Returns:
- a new
EngineConfigwith the settings modified
-
getMinParallelism
public int getMinParallelism()Returns the raw, un-interpreted value of the parallelism setting- Returns:
- the raw, un-interpreted value of the parallelism setting
-
minParallelism
Specifies that the engine should use the given degree of parallelism for graphs.- Parameters:
count- the default degree of parallelism to use- Returns:
- a new
EngineConfigwith the settings modified
-
getMaxRetries
public int getMaxRetries()Maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.- Returns:
- the maximum number of retries
-
maxRetries
Specifies the maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.- Parameters:
maxRetries- the max retries- Returns:
- a new
EngineConfigwith the settings modified
-
getDumpfilePath
Returns the dump path where low-level statistics and plan dumps are placed.- Returns:
- the dump path
-
dumpfilePath
Specifies the dump path where low-level statistics and plan dumps are placed.- Parameters:
path- the dump path- Returns:
- a new
EngineConfigwith the settings modified
-
getSchedulerQueue
Returns the scheduler queue used for scheduling jobs.- Returns:
- scheduler queue name
-
schedulerQueue
Specifies the scheduler queue to use when a job is executed.- Parameters:
queueName- name of the queue to use- Returns:
- a new
EngineConfigwith the settings modified
-
getRawExtensionPaths
Returns the comma delimited list of user extension paths.- Returns:
- user extension directory paths
-
getExtensionPaths
Return the list of extension paths.- Returns:
- set of extension paths
-
extensionPaths
Specifies the list of user extension directory paths to use.- Parameters:
paths- set of user extension directory paths- Returns:
- a new
EngineConfigwith the settings modified
-