- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.StreamingOperator
-
- com.pervasive.datarush.operators.ExecutableOperator
-
- com.pervasive.datarush.operators.scripting.RunScript
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class RunScript extends ExecutableOperator implements RecordPipelineOperator
Processes rows using user-defined scripts. Calls third-party script engines via thejavax.scripting
APIs of JSR 223. The script engine must support theCompilable
interface.Operation is as follows:
- Bind variables into the script's context
- Compile all scripts
- Run the
beforeFirstRecordScript
- For every row of input:
- Move data from input ports to script context
- Run the
onEveryRecordScript
- Move data from script context to output ports
- Run the
afterLastRecordScript
Note that this operator supports running in a distributed context, whether that be locally partitioned or within a cluster. Each instance of the operator executes the above operation sequence. This implies that the before and after scripts only affect the scripting environment within which they are run. The enviroments are not shared across instances.
-
-
Constructor Summary
Constructors Constructor Description RunScript()
Default constructor for the run script operator.RunScript(String engineName, String onEveryRecordScript, RecordTokenType outputType)
Construct a run script operator with the minimum parameters required for execution.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contractsstatic String
convertFieldName(String fieldName)
Convert an input field name into a valid variable for the scripting environment being used.protected void
execute(ExecutionContext context)
Executes the operator.String
getAfterLastRecordScript()
Get the source of the script run after all input records have been processed.String
getBeforeFirstRecordScript()
Get the source of the script run before any input records are processed.String
getEngineName()
Get the name of the scripting engine to use.String
getEngineVersion()
Get the version of the scripting engine to use.RecordPort
getInput()
Get the input port to the run script operator.String
getLanguageName()
Get the name of the script language to use.String
getLanguageVersion()
Get the version of the script language to use.String
getOnEveryRecordScript()
Get the source of the script to execute for every input record.RecordPort
getOutput()
Get the output port of the run script operator.RecordTokenType
getOutputType()
Get the defined output type.String
getStderrFileName()
Get the name of the file to use for standard error output.String
getStdoutFileName()
Get the name of the file to use for standard output.Map<String,Object>
getVariables()
Get the mapping of variables.void
setAfterLastRecordScript(String script)
Set the script to use for clean up processing.void
setBeforeFirstRecordScript(String script)
Set the script to use for initialization processing.void
setEngineName(String engineName)
Set the name of the script engine to use.void
setEngineVersion(String engineVersion)
Set the version of the script engine to use.void
setLanguageName(String languageName)
Set the name of the scripting language to use.void
setLanguageVersion(String languageVersion)
Set the version of the script language to use.void
setOnEveryRecordScript(String script)
Set the script to use for processing every input record.void
setOutputType(RecordTokenType outputType)
Set the output type.void
setStderrFileName(String stderrFileName)
Set the pathname of the file that will capture the standard error output of the script execution environment.void
setStdoutFileName(String stdoutFileName)
Set the pathname of the file that will capture the standard output of the script execution environment.void
setVariables(Map<String,Object> variables)
Set the collection of variables to be incorporated into the script environment before executing any of the given script source.-
Methods inherited from class com.pervasive.datarush.operators.ExecutableOperator
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutput
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
RunScript
public RunScript()
Default constructor for the run script operator.
-
RunScript
public RunScript(String engineName, String onEveryRecordScript, RecordTokenType outputType)
Construct a run script operator with the minimum parameters required for execution.- Parameters:
engineName
- name of the scripting engine to useonEveryRecordScript
- script run for every record of inputoutputType
- schema of the output of the executed script
-
-
Method Detail
-
getInput
public RecordPort getInput()
Get the input port to the run script operator.- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Returns:
- input port
-
getOutput
public RecordPort getOutput()
Get the output port of the run script operator.- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Returns:
- output port
-
getVariables
public Map<String,Object> getVariables()
Get the mapping of variables.- Returns:
- variable map
-
setVariables
public void setVariables(Map<String,Object> variables)
Set the collection of variables to be incorporated into the script environment before executing any of the given script source. The key values within the given map will be used as the variable names. The values for each are set to the variable values within the script environment.- Parameters:
variables
- map of name=value pairs
-
getAfterLastRecordScript
public String getAfterLastRecordScript()
Get the source of the script run after all input records have been processed.- Returns:
- source of after last record script
-
getBeforeFirstRecordScript
public String getBeforeFirstRecordScript()
Get the source of the script run before any input records are processed.- Returns:
- source of before first record script
-
getEngineName
public String getEngineName()
Get the name of the scripting engine to use.- Returns:
- script engine name
-
getEngineVersion
public String getEngineVersion()
Get the version of the scripting engine to use.- Returns:
- script engine version
-
getLanguageName
public String getLanguageName()
Get the name of the script language to use.- Returns:
- language name
-
getLanguageVersion
public String getLanguageVersion()
Get the version of the script language to use.- Returns:
- language version
-
getOnEveryRecordScript
public String getOnEveryRecordScript()
Get the source of the script to execute for every input record.- Returns:
- source of on every record script
-
getOutputType
public RecordTokenType getOutputType()
Get the defined output type.- Returns:
- output type
-
getStderrFileName
public String getStderrFileName()
Get the name of the file to use for standard error output.- Returns:
- file for standard error output
-
getStdoutFileName
public String getStdoutFileName()
Get the name of the file to use for standard output.- Returns:
- file for standard output
-
setAfterLastRecordScript
public void setAfterLastRecordScript(String script)
Set the script to use for clean up processing. This script is executed after every record within the input flow has been processed.- Parameters:
script
- source of after last record script
-
setBeforeFirstRecordScript
public void setBeforeFirstRecordScript(String script)
Set the script to use for initialization processing. This script is executed before the first record of the input data is processed.- Parameters:
script
- source of before first record script
-
setEngineName
public void setEngineName(String engineName)
Set the name of the script engine to use.- Parameters:
engineName
- engine name
-
setEngineVersion
public void setEngineVersion(String engineVersion)
Set the version of the script engine to use.- Parameters:
engineVersion
- script engine version
-
setLanguageName
public void setLanguageName(String languageName)
Set the name of the scripting language to use.- Parameters:
languageName
- script language name
-
setLanguageVersion
public void setLanguageVersion(String languageVersion)
Set the version of the script language to use.- Parameters:
languageVersion
- script language version
-
setOnEveryRecordScript
public void setOnEveryRecordScript(String script)
Set the script to use for processing every input record. This script will be executed with the data values from each input record.- Parameters:
script
- source of script for record processing
-
setOutputType
public void setOutputType(RecordTokenType outputType)
Set the output type. The output type must include any new variables created by the script being executed.- Parameters:
outputType
- the output type of this operator
-
setStderrFileName
public void setStderrFileName(String stderrFileName)
Set the pathname of the file that will capture the standard error output of the script execution environment.- Parameters:
stderrFileName
- path of standard error output
-
setStdoutFileName
public void setStdoutFileName(String stdoutFileName)
Set the pathname of the file that will capture the standard output of the script execution environment.- Parameters:
stdoutFileName
- path of standard output
-
convertFieldName
public static String convertFieldName(String fieldName)
Convert an input field name into a valid variable for the scripting environment being used. The script being executed must make the same substitution for this to work. This is done to handle field names that may have embedded spaces and other characters that are not valid variable names in the target scripting environment. This operator will use this same conversion for mapping input fields into script variables.Every situation for every scripting environment cannot be handled here. A work around is to rename fields in question before using them in this operator.
- Parameters:
fieldName
- name of the input field- Returns:
- converted into a valid variable name
-
computeMetadata
protected void computeMetadata(StreamingMetadataContext ctx)
Description copied from class:StreamingOperator
Implementations must adhere to the following contractsGeneral
Regardless of input ports/output port types, all implementations must do the following:- Validation. Validation of configuration should always be performed first.
- Declare parallelizability.. Implementations must declare parallelizability by calling
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
Input record ports
Implementations with input record ports must declare the following:- Required data ordering: Implementations that have data ordering requirements must declare them by calling
- Required data distribution (only applies to parallelizable operators): Implementations that have data distribution requirements must declare them by calling
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.RecordPort#setRequiredDataDistribution
, otherwise data will arrive in anunspecified partial distribution
.RecordPort#getSourceDataDistribution
andRecordPort#getSourceDataOrdering
. These should be viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must still declare data ordering and data distribution requirements; otherwise there is no guarantee that data will arrive sorted/distributed as required.Output record ports
Implementations with output record ports must declare the following:- Type: Implementations must declare their output type by calling
RecordPort#setType
.
- Output data ordering: Implementations that can make guarantees as to their output
ordering may do so by calling
RecordPort#setOutputDataOrdering
- Output data distribution (only applies to parallelizable operators): Implementations that can make guarantees as to their output
distribution may do so by calling
RecordPort#setOutputDataDistribution
Input model ports
In general, there is nothing special to declare for input model ports. Models are implicitly duplicated to all partitions when going from non-parallel to parallel operators. The case of a model going from a parallel to a non-parallel node is a special case of a "model reducer" operator. In the case of a model reducer, the downstream operator, must declare the following:- Merge handler: Model reducers must declare a merge handler by
calling
AbstractModelPort#setMergeHandler
.
MergeModel
is a convenient, re-usable model reducer, parameterized with a merge-handler.Output model ports
SimpleModelPort
's have no associated metadata and therefore there is never any output metadata to declare.PMMLPort
's, on the other hand, do have associated metadata. For all PMMLPorts, implementations must declare the following:- pmmlModelSpec: Implementations must declare the PMML model spec
by calling
PMMLPort.setPMMLModelSpec
.
- Specified by:
computeMetadata
in classStreamingOperator
- Parameters:
ctx
- the context
-
execute
protected void execute(ExecutionContext context)
Description copied from class:ExecutableOperator
Executes the operator. Implementations should adhere to the following contracts:- Following execution, all input ports must be at end-of-data.
- Following execution, all output ports must be at end-of-data.
- Specified by:
execute
in classExecutableOperator
- Parameters:
context
- context in which to lookup physical ports bound to logical ports
-
-