- java.lang.Object
-
- com.pervasive.datarush.graphs.EngineConfig
-
public final class EngineConfig extends Object
A collection of engine configuration settings. These collections are immutable, but provide methods for constructing new configurations from an existing one.A number of instance methods are provided to permit a more fluent and declarative style of specifying configurations. For example,
public static final EngineConfig myConfig= engine().monitor(true).licensePath("/path/to/my/license.file");
which defines a configuration with:- statistics collection enabled for processes and ports
- the license file expected at
/path/to/my/license.file
- otherwise default settings
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
EngineConfig.Ports
Nested class containing settings specific to ports.static class
EngineConfig.RemoteMonitoring
Settings in this object determine settings for remote monitoringstatic class
EngineConfig.Sort
Settings in this object determine default tuning for theSort
operator.
-
Field Summary
Fields Modifier and Type Field Description static EngineProperty<ClasspathSpecifier>
CLASSPATH_SPECIFIER
Property controlling classpath that is used for launching remote jobsstatic EngineProperty<ClusterSpecifier>
CLUSTER
Property controls which cluster to run onstatic EngineProperty<String>
DUMPFILE_PATH
Defines the directory in which where low-level statistics and plan dumps are to be placedstatic EngineProperty<String>
EXTENSION_PATHS
Defines a list of directories in shared storage containing user extensionsstatic EngineProperty<FileMetaConfiguration>
FILE_CONFIGURATION
Property controlling authorization for file access within the graph.static EngineProperty<Integer>
MAX_RETRIES
Maximum number of retry attempts.static EngineProperty<Integer>
MIN_PARALLELISM
Property controlling the default parallelism used in graph.static EngineProperty<ModuleConfiguration>
MODULE_CONFIGURATION
Property controlling classpath that is used for launching remote jobsstatic EngineProperty<Boolean>
MONITOR
Property controlling the collection of execution statistics.static EngineProperty<NetworkConfiguration>
NETWORK_CONFIGURATION
Property controlling the socket provider to be used for communication between the client and cluster managerstatic EngineProperty<Integer>
PARALLELISM
Property controlling the default parallelism used in graph.EngineConfig.Ports
ports
The ports sub-objectEngineConfig.RemoteMonitoring
remoteMonitoring
The remote monitoring sub-object.static EngineProperty<String>
SCHEDULER_QUEUE
Defines the scheduler queue used for executing jobsEngineConfig.Sort
sort
The sort sub-object.static EngineProperty<Path>
STORAGE_MANAGEMENT_PATH
Property that controls a base path to use for intermediate storage.static EngineProperty<Integer>
SUBGRAPH_HISTORY_SIZE
Property controlling the maximum number of subgraphs to retain when tracking execution history.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description EngineConfig
applyConfig(EngineConfig config)
Constructs a new configuration by merging the properties of another config with this one.EngineConfig
autoParallelize()
Specifies that the engine should automatically determine the default degree of parallelism for graphs.EngineConfig
classpathSpecifier(ClasspathSpecifier classpath)
Sets the classpath to be used in the case of a remote job.EngineConfig
cluster(ClusterSpecifier spec)
Specifies the cluster on which we are to runEngineConfig
cluster(String url)
Specifies the cluster that we are to run on.EngineConfig
defaultStorageManagementPath()
Specifies to use the default storage management path.EngineConfig
dumpfilePath(String path)
Specifies the dump path where low-level statistics and plan dumps are placed.static EngineConfig
engine()
Returns the default engine configurationstatic EngineConfig
engine(Properties properties)
Creates an EngineConfig from a set of properties.EngineConfig
extensionPaths(String... paths)
Specifies the list of user extension directory paths to use.EngineConfig
fileConfiguration(FileMetaConfiguration configuration)
Specifies the file configuration to be used by operators in the graph.int
getAvailableProcessors()
Retrieves the configured number of processors available for this engine.ClasspathSpecifier
getClasspathSpecifier()
Returns the classpath to be used in the case of a remote job.ClusterSpecifier
getCluster()
Returns the cluster specifier.String
getDumpfilePath()
Returns the dump path where low-level statistics and plan dumps are placed.String[]
getExtensionPaths()
Return the list of extension paths.FileMetaConfiguration
getFileConfiguration()
Returns the file configuration to be used by operators in the graph.int
getMaxRetries()
Maximum number of retries.int
getMinParallelism()
Returns the raw, un-interpreted value of the parallelism settingModuleConfiguration
getModuleConfiguration()
Returns the module configuration to be used by operators in the graph.NetworkConfiguration
getNetworkConfiguration()
Returns the network configuration to be used for communication between the client and cluster manager.int
getParallelism()
Returns the raw, un-interpreted value of the parallelism setting<T> T
getProperty(EngineProperty<T> property)
Returns the value of an arbitrary propertyString
getRawExtensionPaths()
Returns the comma delimited list of user extension paths.String
getSchedulerQueue()
Returns the scheduler queue used for scheduling jobs.Path
getStorageManagementPath()
Returns the root scratch directory to use.int
getSubgraphHistorySize()
Returns the maximum number of subgraphs to retain when tracking execution history.boolean
isMonitored()
Indicates whether run-time statistics are gathered.boolean
isPseudoDistributed()
Returns true if the configuredClusterSpecifier
is for the pseudo-distributed (i.e.EngineConfig
maxRetries(int maxRetries)
Specifies the maximum number of retries.EngineConfig
minParallelism(int count)
Specifies that the engine should use the given degree of parallelism for graphs.EngineConfig
moduleConfiguration(ModuleConfiguration configuration)
Specifies the module configuration to be used by operators in the graphEngineConfig
monitored(boolean enabled)
Specifies whether performance statistics should be gathered for executed graphs.static EngineConfig
monitoredEngine()
Returns a default engine configuration with monitoring enabled.EngineConfig
networkConfiguration(NetworkConfiguration networkConfiguration)
Sets the network configuration to be used for communication between the client and cluster manager.EngineConfig
parallelism(int count)
Specifies that the engine should use the given degree of parallelism for graphs.<T> EngineConfig
property(EngineProperty<T> property, T value)
Constructs a new configuration overriding the specified setting All other settings are left unchanged.EngineConfig
pseudoDistributed()
Specifies that we are to run in pseudo-distributed mode.EngineConfig
schedulerQueue(String queueName)
Specifies the scheduler queue to use when a job is executed.EngineConfig
storageManagementPath(Path path)
Specifies the root scratch directory to use.EngineConfig
subgraphHistorySize(int size)
Sets the maximum number of subgraphs to retain when tracking execution history.String
toString()
-
-
-
Field Detail
-
MODULE_CONFIGURATION
public static final EngineProperty<ModuleConfiguration> MODULE_CONFIGURATION
Property controlling classpath that is used for launching remote jobs
-
FILE_CONFIGURATION
public static final EngineProperty<FileMetaConfiguration> FILE_CONFIGURATION
Property controlling authorization for file access within the graph.
-
CLASSPATH_SPECIFIER
public static final EngineProperty<ClasspathSpecifier> CLASSPATH_SPECIFIER
Property controlling classpath that is used for launching remote jobs
-
NETWORK_CONFIGURATION
public static final EngineProperty<NetworkConfiguration> NETWORK_CONFIGURATION
Property controlling the socket provider to be used for communication between the client and cluster manager
-
MONITOR
public static final EngineProperty<Boolean> MONITOR
Property controlling the collection of execution statistics. By default, they are not collected.
-
SUBGRAPH_HISTORY_SIZE
public static final EngineProperty<Integer> SUBGRAPH_HISTORY_SIZE
Property controlling the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.
-
STORAGE_MANAGEMENT_PATH
public static final EngineProperty<Path> STORAGE_MANAGEMENT_PATH
Property that controls a base path to use for intermediate storage. The default value of empty string will select an appropriate storage location depending on the type of cluster
-
CLUSTER
public static final EngineProperty<ClusterSpecifier> CLUSTER
Property controls which cluster to run on
-
PARALLELISM
public static final EngineProperty<Integer> PARALLELISM
Property controlling the default parallelism used in graph. By default, this is automatically determined.
-
MIN_PARALLELISM
public static final EngineProperty<Integer> MIN_PARALLELISM
Property controlling the default parallelism used in graph. By default, this is automatically determined.
-
MAX_RETRIES
public static final EngineProperty<Integer> MAX_RETRIES
Maximum number of retry attempts. The default value of zero implies no retries.
-
DUMPFILE_PATH
public static final EngineProperty<String> DUMPFILE_PATH
Defines the directory in which where low-level statistics and plan dumps are to be placed
-
SCHEDULER_QUEUE
public static final EngineProperty<String> SCHEDULER_QUEUE
Defines the scheduler queue used for executing jobs
-
EXTENSION_PATHS
public static final EngineProperty<String> EXTENSION_PATHS
Defines a list of directories in shared storage containing user extensions
-
ports
public final EngineConfig.Ports ports
The ports sub-object
-
sort
public final EngineConfig.Sort sort
The sort sub-object.
-
remoteMonitoring
public final EngineConfig.RemoteMonitoring remoteMonitoring
The remote monitoring sub-object.
-
-
Method Detail
-
getModuleConfiguration
public ModuleConfiguration getModuleConfiguration()
Returns the module configuration to be used by operators in the graph.- Returns:
- the module configuration
-
moduleConfiguration
public EngineConfig moduleConfiguration(ModuleConfiguration configuration)
Specifies the module configuration to be used by operators in the graph- Parameters:
configuration
- the module configuration- Returns:
- a new engine config with the specified module configuration
-
getFileConfiguration
public FileMetaConfiguration getFileConfiguration()
Returns the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph- Returns:
- the file configuration to be used by operators in the graph.
-
fileConfiguration
public EngineConfig fileConfiguration(FileMetaConfiguration configuration)
Specifies the file configuration to be used by operators in the graph. This provides an authorization context to be used by the graph- Parameters:
configuration
- the file configuration to be used by operators in the graph.- Returns:
- a new
EngineConfig
with the settings modified
-
getClasspathSpecifier
public ClasspathSpecifier getClasspathSpecifier()
Returns the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.- Returns:
- the classpath to be used when running a remote job.
-
classpathSpecifier
public EngineConfig classpathSpecifier(ClasspathSpecifier classpath)
Sets the classpath to be used in the case of a remote job. This property is ignored when running in pseudo-distributed mode.- Parameters:
classpath
- the classpath to be used when running a remote job.- Returns:
- a new
EngineConfig
with the settings modified
-
getNetworkConfiguration
public NetworkConfiguration getNetworkConfiguration()
Returns the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.- Returns:
- the network configuration to be used when running a remote job.
-
networkConfiguration
public EngineConfig networkConfiguration(NetworkConfiguration networkConfiguration)
Sets the network configuration to be used for communication between the client and cluster manager. This property may also impact certain remote file systems.- Parameters:
networkConfiguration
- the network configuration to be used when running a remote job.- Returns:
- a new
EngineConfig
with the settings modified
-
getAvailableProcessors
public int getAvailableProcessors()
Retrieves the configured number of processors available for this engine. This value is normally used to determine the width of horizontal partitions. The default value is the number of processor cores available at run-time.- Returns:
- the number of processors available for engine use
-
getProperty
public <T> T getProperty(EngineProperty<T> property)
Returns the value of an arbitrary property- Type Parameters:
T
- the property type- Parameters:
property
- the property- Returns:
- the value of the property
-
property
public <T> EngineConfig property(EngineProperty<T> property, T value)
Constructs a new configuration overriding the specified setting All other settings are left unchanged. Note that clients should typically not use this method; this is for internal use and for custom properties.This call does not modify this object.
- Type Parameters:
T
- the type of property- Parameters:
property
- the property to modifyvalue
- the new value for that property- Returns:
- the resulting new configuration
-
applyConfig
public EngineConfig applyConfig(EngineConfig config)
Constructs a new configuration by merging the properties of another config with this one. Any non-default properties of the new configuration will override the properties of the existing configuration creating a configuration object.- Parameters:
config
- the engine configuration containing new properties- Returns:
- the resulting new configuration
-
engine
public static EngineConfig engine()
Returns the default engine configuration- Returns:
- the default engine configuration
-
monitoredEngine
public static EngineConfig monitoredEngine()
Returns a default engine configuration with monitoring enabled.- Returns:
- an engine configuration with monitoring enabled.
-
engine
public static EngineConfig engine(Properties properties)
Creates an EngineConfig from a set of properties. Properties must be built-in engine settings.- Parameters:
properties
- the properties- Returns:
- an EngineConfig from a set of properties
-
isMonitored
public boolean isMonitored()
Indicates whether run-time statistics are gathered. Returns true if either local monitoring is enabled or remote monitoring is enabled.- Returns:
true
if run-time statistics will be gathered
-
monitored
public EngineConfig monitored(boolean enabled)
Specifies whether performance statistics should be gathered for executed graphs.- Parameters:
enabled
- indicates whether statistics collection is active- Returns:
- a new
EngineConfig
with the settings modified
-
getSubgraphHistorySize
public int getSubgraphHistorySize()
Returns the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.- Returns:
- the maximum number of subgraphs to retain when tracking execution history
-
subgraphHistorySize
public EngineConfig subgraphHistorySize(int size)
Sets the maximum number of subgraphs to retain when tracking execution history. Subgraphs are retained in memory, so it's important to not retain too many over time. By default, we retain the last 10 subgraphs for any given operator.- Parameters:
size
- the maximum number of subgraphs to retain when tracking execution history- Returns:
- a new
EngineConfig
with the settings modified
-
getStorageManagementPath
public Path getStorageManagementPath()
Returns the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.- Returns:
- the root scratch directory to use
-
storageManagementPath
public EngineConfig storageManagementPath(Path path)
Specifies the root scratch directory to use. For distributed, this must be a shared filesystem accessible by all node in the cluster.- Parameters:
path
- the storage management path.- Returns:
- a new
EngineConfig
with the settings modified
-
defaultStorageManagementPath
public EngineConfig defaultStorageManagementPath()
Specifies to use the default storage management path.- Returns:
- a new
EngineConfig
with the settings modified
-
getCluster
public ClusterSpecifier getCluster()
Returns the cluster specifier. The cluster specifier determines whether we run in distributed or pseudo-distributed mode and, if distributed, specifies the cluster (host and port) on which we are to run.- Returns:
- the cluster specifier
-
cluster
public EngineConfig cluster(ClusterSpecifier spec)
Specifies the cluster on which we are to run- Parameters:
spec
- the cluster specifier- Returns:
- a new
EngineConfig
with the settings modified
-
pseudoDistributed
public EngineConfig pseudoDistributed()
Specifies that we are to run in pseudo-distributed mode.- Returns:
- a new
EngineConfig
with the settings modified
-
cluster
public EngineConfig cluster(String url)
Specifies the cluster that we are to run on.- Parameters:
url
- the cluster url. For the dr cluster this will be of the form "dr://host:port".- Returns:
- a new
EngineConfig
with the settings modified
-
isPseudoDistributed
public boolean isPseudoDistributed()
Returns true if the configuredClusterSpecifier
is for the pseudo-distributed (i.e. local) cluster.- Returns:
- true if pseudo-distributed
-
autoParallelize
public EngineConfig autoParallelize()
Specifies that the engine should automatically determine the default degree of parallelism for graphs.- Returns:
- a new
EngineConfig
with the settings modified
-
getParallelism
public int getParallelism()
Returns the raw, un-interpreted value of the parallelism setting- Returns:
- the raw, un-interpreted value of the parallelism setting
-
parallelism
public EngineConfig parallelism(int count)
Specifies that the engine should use the given degree of parallelism for graphs.- Parameters:
count
- the default degree of parallelism to use- Returns:
- a new
EngineConfig
with the settings modified
-
getMinParallelism
public int getMinParallelism()
Returns the raw, un-interpreted value of the parallelism setting- Returns:
- the raw, un-interpreted value of the parallelism setting
-
minParallelism
public EngineConfig minParallelism(int count)
Specifies that the engine should use the given degree of parallelism for graphs.- Parameters:
count
- the default degree of parallelism to use- Returns:
- a new
EngineConfig
with the settings modified
-
getMaxRetries
public int getMaxRetries()
Maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.- Returns:
- the maximum number of retries
-
maxRetries
public EngineConfig maxRetries(int maxRetries)
Specifies the maximum number of retries. This count is per executable section of the graph, thus prior failures in earlier sections do not count against the overall total. By default this value is zero and thus retry is disabled.- Parameters:
maxRetries
- the max retries- Returns:
- a new
EngineConfig
with the settings modified
-
getDumpfilePath
public String getDumpfilePath()
Returns the dump path where low-level statistics and plan dumps are placed.- Returns:
- the dump path
-
dumpfilePath
public EngineConfig dumpfilePath(String path)
Specifies the dump path where low-level statistics and plan dumps are placed.- Parameters:
path
- the dump path- Returns:
- a new
EngineConfig
with the settings modified
-
getSchedulerQueue
public String getSchedulerQueue()
Returns the scheduler queue used for scheduling jobs.- Returns:
- scheduler queue name
-
schedulerQueue
public EngineConfig schedulerQueue(String queueName)
Specifies the scheduler queue to use when a job is executed.- Parameters:
queueName
- name of the queue to use- Returns:
- a new
EngineConfig
with the settings modified
-
getRawExtensionPaths
public String getRawExtensionPaths()
Returns the comma delimited list of user extension paths.- Returns:
- user extension directory paths
-
getExtensionPaths
public String[] getExtensionPaths()
Return the list of extension paths.- Returns:
- set of extension paths
-
extensionPaths
public EngineConfig extensionPaths(String... paths)
Specifies the list of user extension directory paths to use.- Parameters:
paths
- set of user extension directory paths- Returns:
- a new
EngineConfig
with the settings modified
-
-