Class ReadFromJDBC

  • All Implemented Interfaces:
    ConnectionSettings, LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class ReadFromJDBC
    extends JDBCOperator
    implements RecordPipelineOperator
    The ReadFromJDBC operator is used to access relational database systems using a supplied JDBC driver. The JDBC driver must be in the classpath of the DataRush application. Each database provides a JDBC driver implementation that can be used by DataRush to access data in the database. Reference the specific database to be accessed for driver specific information.

    The ReadFromJDBC operator can be used to read all of the columns from a specific table or to execute a provided query. The query provided can be a complex, multi-table query. Follow the syntax guidelines of the database being queried. The results of the query will be made available to the output port of the operator. The operator transforms the database column types to supported DataRush scalar types. Some database-specific data types may not map well and will either be ignored or mapped to Java Object types.

    The results of a database query executed through JDBC are returned via a ResultSet object. The ResultSet is used to iterate through the resultant rows and to access column data. The JDBC ResultSet class does not support multi-threaded access. Given that, the default behavior of the ReadFromJDBC operator is to execute in non-parallel mode when provided a query without parameters.

    To execute queries in parallel (and distributed) the ReadFromJDBC operator supports the use of parameterized queries. JDBC supports adding parameters to a query using the "?" character. Below is an example of a parameterized query. Note the use of the "?" character in the "where" clause.

    select * from lineitem where l_shipmode = ?

    When used as the data query for the ReadFromJDBC operator, a parameterized query can be executed in parallel. A set of parameters must be supplied to the parallel workers executing the parameterized queries. The parameters can be supplied in one of three ways:

    • Via the optional input port the ReadFromJDBC operator.
    • Obtained via a parameter query supplied as a property to the operator ("parameterQuery"). The query is executed and the results are used as parameters to the parameterized query.
    • An array of values is passed as a property to the ReadFromJDBC operator ("parameters").
    Here is an example of a parameter query:

    select distinct l_shipmode from lineitem

    Note that the parameter query is selecting a distinct set of values from the "lineitem" table. The values will be substituted for the "?" in the parameterized query.

    The parameters are handled the same whether they are provided directly as objects, read from the input port or queried via the parameter query. For each set (row) of parameters, the following occurs:

    • The parameters are substituted within the parameterized data query. From our example, one of the parameter values is "RAIL". When substituted within the example data query, the resultant query becomes select * from lineitem where l_shipmode = RAIL.
    • The query with the substituted parameters is executed against the database.
    • The results of the query are streamed to the output of the operator.

    When used with a parameterized query and provided query parameters, the ReadFromJDBC operator operates in parallel by creating multiple workers. The query parameters are distributed to the workers in round robin fashion. The workers execute the data query after apply parameter substitution as described above.

    The order of the parameter value is important. The order must match the order of the parameters (the "?") in the data query. This is true for parameter values from the optional input port, provided as objects or from the parameter query. The ReadFromJDBC operator doesn't have the context to determine which parameter values match which parameters. The ordering of the parameter values is left to the user.

    When using a parameterized query, the number of parameter values provided must match the number of parameters in the query. If there is a mismatch in sizes, an exception will be raised and the operator will fail.

    To obtain the best performance, the number of sets of query parameters should be greater than the configured parallelism. In our example parameter query, only 7 values are returned. In this case, having parallelism set to anything greater than 7 will be wasteful. Additional streams of execution will have no data to process.

    Writing to database tables can be accomplished with the WriteToJDBC operator.

    • Constructor Detail

      • ReadFromJDBC

        public ReadFromJDBC()
        Creates a new instance of CopyOfReadFromJDBC.
      • ReadFromJDBC

        public ReadFromJDBC​(String dataQuery,
                            RecordTokenType outputType)
        Creates a new instance of CopyOfReadFromJDBC, specifying the minimal set of required parameters.
        Parameters:
        dataQuery - The query to be run to access data
        outputType - The output type created by the query
      • ReadFromJDBC

        public ReadFromJDBC​(ReadFromJDBC other)
        Copy constructor
        Parameters:
        other - the instance of ReadFromJDBC to copy
    • Method Detail

      • getInput

        public RecordPort getInput()
        Get the optional input port that contains parameter values to apply to a parameterized data query.
        Specified by:
        getInput in interface PipelineOperator<RecordPort>
        Returns:
        optional input port for parameter values
      • getDataQuery

        public String getDataQuery()
        Gets the configured data query.
        Returns:
        SQL select statement that retrieves the wanted data
      • setDataQuery

        public void setDataQuery​(String dataQuery)
        Sets the SELECT statement to use to query data from the source database. This may be a complex query. The query can also be parameterized. If the query is parameterized, use either setParameterQuery(String) or setParameters(Object[][]) to specify the parameters of the data query.
        Parameters:
        dataQuery - database query used to retrieve the wanted data
      • getParameterQuery

        public String getParameterQuery()
        Get the query used to gather values for the parameterized data query.
        Returns:
        query used to gather parameter values
      • setParameterQuery

        public void setParameterQuery​(String parameterQuery)
        Set the database query used to gather parameters from the source database. This query is run to build the parameters for a parameterized data query. The number of columns returned from this query must match the number of parameters required by the data query (set using setDataQuery(String)).

        When using a parameterized query, either the parameter query should be set using this method or set the parameters directly with setParameters(Object[][]).

        Parameters:
        parameterQuery - query used to gather data query parameter values
      • getOutputType

        public RecordTokenType getOutputType()
        Get the configured output type.
        Returns:
        output type of the output port for this operator
      • setOutputType

        public void setOutputType​(RecordTokenType outputType)
        Set the record type of the output port of this operator. If this type is not specified directly it will be discovered. The efficiency of the discovery capability depends on the database driver implementation. It is always more efficient to specify the output type when possible.

        The output type can be discovered using discoverType() or configured directly using configureOutputType().

        Parameters:
        outputType - record token type for the output port
      • getParameters

        public Object[][] getParameters()
        Get the configured parameter values.
        Returns:
        parameter values
      • setParameters

        public void setParameters​(Object[][] parameters)
        Set the parameter values to use when the data query specified with setDataQuery(String) is parameterized. The parameters are specified as an array of Object arrays. The size of the Object arrays must match the number of parameters in the data query. The number of Object arrays affects the parallelization that may be achieved by the operator. Setting the parallelization of the application running this operator to a number greater than the number of parameter sets is inefficient.

        When using a parameterized data query, either pass in the parameter values using this method or set a query to gather the parameter values using setParameterQuery(String).

        Parameters:
        parameters - parameter values for a parameterized query
      • getFetchSize

        public int getFetchSize()
        Retrieves the number of rows in each fetch from the database.
        Returns:
        the number of rows in each fetch from the database
      • setFetchSize

        public void setFetchSize​(int fetchSize)
        Sets the number of rows in each fetch from the database.
        Parameters:
        fetchSize - the number of rows in each fetch from the database
      • getDiscoverOutputAtRuntime

        public boolean getDiscoverOutputAtRuntime()
        Get whether the output type should be automatically discovered at runtime.
        Returns:
        whether the output is discovered at runtime
      • setDiscoverOutputAtRuntime

        public void setDiscoverOutputAtRuntime​(boolean discoverOutputAtRuntime)
        Set whether the output type should be automatically discovered at runtime. Generally this should only be used when calling stored procedures that dynamically generate the output type during execution. In this case the correct output type cannot be discovered without executing the full query during composition. This can potentially decrease overall performance and should be used with caution.
        Parameters:
        discoverOutputAtRuntime - if the operator output will be discovered at runtime
      • configureOutputType

        public void configureOutputType()
        Convenience method, equivalent to setOutputType(discoverType())
      • discoverType

        public RecordTokenType discoverType()
        Discovers the output type based on this operator's configuration.
        Returns:
        the discovered output type
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context