Class RunScript

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class RunScript
    extends ExecutableOperator
    implements RecordPipelineOperator
    Processes rows using user-defined scripts. Calls third-party script engines via the javax.scripting APIs of JSR 223. The script engine must support the Compilable interface.

    Operation is as follows:

    1. Bind variables into the script's context
    2. Compile all scripts
    3. Run the beforeFirstRecordScript
    4. For every row of input:
      1. Move data from input ports to script context
      2. Run the onEveryRecordScript
      3. Move data from script context to output ports
    5. Run the afterLastRecordScript
    Binding and retrieval are done using the names given to the input and output fields via their associated types.

    Note that this operator supports running in a distributed context, whether that be locally partitioned or within a cluster. Each instance of the operator executes the above operation sequence. This implies that the before and after scripts only affect the scripting environment within which they are run. The enviroments are not shared across instances.

    • Constructor Detail

      • RunScript

        public RunScript()
        Default constructor for the run script operator.
      • RunScript

        public RunScript​(String engineName,
                         String onEveryRecordScript,
                         RecordTokenType outputType)
        Construct a run script operator with the minimum parameters required for execution.
        Parameters:
        engineName - name of the scripting engine to use
        onEveryRecordScript - script run for every record of input
        outputType - schema of the output of the executed script
    • Method Detail

      • getVariables

        public Map<String,​Object> getVariables()
        Get the mapping of variables.
        Returns:
        variable map
      • setVariables

        public void setVariables​(Map<String,​Object> variables)
        Set the collection of variables to be incorporated into the script environment before executing any of the given script source. The key values within the given map will be used as the variable names. The values for each are set to the variable values within the script environment.
        Parameters:
        variables - map of name=value pairs
      • getAfterLastRecordScript

        public String getAfterLastRecordScript()
        Get the source of the script run after all input records have been processed.
        Returns:
        source of after last record script
      • getBeforeFirstRecordScript

        public String getBeforeFirstRecordScript()
        Get the source of the script run before any input records are processed.
        Returns:
        source of before first record script
      • getEngineName

        public String getEngineName()
        Get the name of the scripting engine to use.
        Returns:
        script engine name
      • getEngineVersion

        public String getEngineVersion()
        Get the version of the scripting engine to use.
        Returns:
        script engine version
      • getLanguageName

        public String getLanguageName()
        Get the name of the script language to use.
        Returns:
        language name
      • getLanguageVersion

        public String getLanguageVersion()
        Get the version of the script language to use.
        Returns:
        language version
      • getOnEveryRecordScript

        public String getOnEveryRecordScript()
        Get the source of the script to execute for every input record.
        Returns:
        source of on every record script
      • getOutputType

        public RecordTokenType getOutputType()
        Get the defined output type.
        Returns:
        output type
      • getStderrFileName

        public String getStderrFileName()
        Get the name of the file to use for standard error output.
        Returns:
        file for standard error output
      • getStdoutFileName

        public String getStdoutFileName()
        Get the name of the file to use for standard output.
        Returns:
        file for standard output
      • setAfterLastRecordScript

        public void setAfterLastRecordScript​(String script)
        Set the script to use for clean up processing. This script is executed after every record within the input flow has been processed.
        Parameters:
        script - source of after last record script
      • setBeforeFirstRecordScript

        public void setBeforeFirstRecordScript​(String script)
        Set the script to use for initialization processing. This script is executed before the first record of the input data is processed.
        Parameters:
        script - source of before first record script
      • setEngineName

        public void setEngineName​(String engineName)
        Set the name of the script engine to use.
        Parameters:
        engineName - engine name
      • setEngineVersion

        public void setEngineVersion​(String engineVersion)
        Set the version of the script engine to use.
        Parameters:
        engineVersion - script engine version
      • setLanguageName

        public void setLanguageName​(String languageName)
        Set the name of the scripting language to use.
        Parameters:
        languageName - script language name
      • setLanguageVersion

        public void setLanguageVersion​(String languageVersion)
        Set the version of the script language to use.
        Parameters:
        languageVersion - script language version
      • setOnEveryRecordScript

        public void setOnEveryRecordScript​(String script)
        Set the script to use for processing every input record. This script will be executed with the data values from each input record.
        Parameters:
        script - source of script for record processing
      • setOutputType

        public void setOutputType​(RecordTokenType outputType)
        Set the output type. The output type must include any new variables created by the script being executed.
        Parameters:
        outputType - the output type of this operator
      • setStderrFileName

        public void setStderrFileName​(String stderrFileName)
        Set the pathname of the file that will capture the standard error output of the script execution environment.
        Parameters:
        stderrFileName - path of standard error output
      • setStdoutFileName

        public void setStdoutFileName​(String stdoutFileName)
        Set the pathname of the file that will capture the standard output of the script execution environment.
        Parameters:
        stdoutFileName - path of standard output
      • convertFieldName

        public static String convertFieldName​(String fieldName)
        Convert an input field name into a valid variable for the scripting environment being used. The script being executed must make the same substitution for this to work. This is done to handle field names that may have embedded spaces and other characters that are not valid variable names in the target scripting environment. This operator will use this same conversion for mapping input fields into script variables.

        Every situation for every scripting environment cannot be handled here. A work around is to rename fields in question before using them in this operator.

        Parameters:
        fieldName - name of the input field
        Returns:
        converted into a valid variable name
      • computeMetadata

        protected void computeMetadata​(StreamingMetadataContext ctx)
        Description copied from class: StreamingOperator
        Implementations must adhere to the following contracts

        General

        Regardless of input ports/output port types, all implementations must do the following:

        1. Validation. Validation of configuration should always be performed first.
        2. Declare parallelizability.. Implementations must declare parallelizability by calling StreamingMetadataContext.parallelize(ParallelismStrategy).

        Input record ports

        Implementations with input record ports must declare the following:
        1. Required data ordering:
        2. Implementations that have data ordering requirements must declare them by calling RecordPort#setRequiredDataOrdering, otherwise data may arrive in any order.
        3. Required data distribution (only applies to parallelizable operators):
        4. Implementations that have data distribution requirements must declare them by calling RecordPort#setRequiredDataDistribution, otherwise data will arrive in an unspecified partial distribution.
        Note that if the upstream operator's output distribution/ordering is compatible with those required, we avoid a re-sort/re-distribution which is generally a very large savings from a performance standpoint. In addition, some operators may chose to query the upstream output distribution/ordering by calling RecordPort#getSourceDataDistribution and RecordPort#getSourceDataOrdering. These should be viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must still declare data ordering and data distribution requirements; otherwise there is no guarantee that data will arrive sorted/distributed as required.

        Output record ports

        Implementations with output record ports must declare the following:
        1. Type: Implementations must declare their output type by calling RecordPort#setType.
        Implementations with output record ports may declare the following:
        1. Output data ordering: Implementations that can make guarantees as to their output ordering may do so by calling RecordPort#setOutputDataOrdering
        2. Output data distribution (only applies to parallelizable operators): Implementations that can make guarantees as to their output distribution may do so by calling RecordPort#setOutputDataDistribution
        Note that both of these properties are optional; if unspecified, performance may suffer since the framework may unnecessarily re-sort/re-distributed the data.

        Input model ports

        In general, there is nothing special to declare for input model ports. Models are implicitly duplicated to all partitions when going from non-parallel to parallel operators. The case of a model going from a parallel to a non-parallel node is a special case of a "model reducer" operator. In the case of a model reducer, the downstream operator, must declare the following:
        1. Merge handler: Model reducers must declare a merge handler by calling AbstractModelPort#setMergeHandler.
        Note that MergeModel is a convenient, re-usable model reducer, parameterized with a merge-handler.

        Output model ports

        SimpleModelPort's have no associated metadata and therefore there is never any output metadata to declare. PMMLPort's, on the other hand, do have associated metadata. For all PMMLPorts, implementations must declare the following:
        1. pmmlModelSpec: Implementations must declare the PMML model spec by calling PMMLPort.setPMMLModelSpec.
        Specified by:
        computeMetadata in class StreamingOperator
        Parameters:
        ctx - the context
      • execute

        protected void execute​(ExecutionContext context)
        Description copied from class: ExecutableOperator
        Executes the operator. Implementations should adhere to the following contracts:
        1. Following execution, all input ports must be at end-of-data.
        2. Following execution, all output ports must be at end-of-data.
        Specified by:
        execute in class ExecutableOperator
        Parameters:
        context - context in which to lookup physical ports bound to logical ports