- All Implemented Interfaces:
ConnectionSettings,LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
ReadFromJDBC operator is used to access relational database systems using a supplied JDBC driver.
The JDBC driver must be in the classpath of the DataRush application. Each database provides a JDBC
driver implementation that can be used by DataRush to access data in the database. Reference the
specific database to be accessed for driver specific information.
The ReadFromJDBC operator can be used to read all of the columns from a specific table or to execute
a provided query. The query provided can be a complex, multi-table query. Follow the syntax guidelines
of the database being queried. The results of the query will be made available to the output port of
the operator. The operator transforms the database column types to supported DataRush scalar types.
Some database-specific data types may not map well and will either be ignored or mapped to Java Object types.
The results of a database query executed through JDBC are returned via a ResultSet object. The ResultSet
is used to iterate through the resultant rows and to access column data. The JDBC ResultSet class does
not support multi-threaded access. Given that, the default behavior of the ReadFromJDBC operator is to
execute in non-parallel mode when provided a query without parameters.
To execute queries in parallel (and distributed) the ReadFromJDBC operator supports the use of
parameterized queries. JDBC supports adding parameters to a query using the "?" character. Below is an
example of a parameterized query. Note the use of the "?" character in the "where" clause.
select * from lineitem where l_shipmode = ?
When used as the data query for the ReadFromJDBC operator, a parameterized query can be executed in parallel.
A set of parameters must be supplied to the parallel workers executing the parameterized queries.
The parameters can be supplied in one of three ways:
- Via the optional input port the
ReadFromJDBCoperator. - Obtained via a parameter query supplied as a property to the operator ("parameterQuery"). The query is executed and the results are used as parameters to the parameterized query.
- An array of values is passed as a property to the
ReadFromJDBCoperator ("parameters").
select distinct l_shipmode from lineitem
Note that the parameter query is selecting a distinct set of values from the "lineitem" table. The values will be substituted for the "?" in the parameterized query.
The parameters are handled the same whether they are provided directly as objects, read from the input port or queried via the parameter query. For each set (row) of parameters, the following occurs:
- The parameters are substituted within the parameterized data query. From our example,
one of the parameter values is "RAIL". When substituted within the example data query, the resultant
query becomes
select * from lineitem where l_shipmode = RAIL. - The query with the substituted parameters is executed against the database.
- The results of the query are streamed to the output of the operator.
When used with a parameterized query and provided query parameters, the ReadFromJDBC operator operates
in parallel by creating multiple workers. The query parameters are distributed to the workers in round
robin fashion. The workers execute the data query after apply parameter substitution as described above.
The order of the parameter value is important. The order must match the order of the parameters (the "?")
in the data query. This is true for parameter values from the optional input port, provided as objects or
from the parameter query. The ReadFromJDBC operator doesn't have the context to determine which parameter
values match which parameters. The ordering of the parameter values is left to the user.
When using a parameterized query, the number of parameter values provided must match the number of parameters in the query. If there is a mismatch in sizes, an exception will be raised and the operator will fail.
To obtain the best performance, the number of sets of query parameters should be greater than the configured parallelism. In our example parameter query, only 7 values are returned. In this case, having parallelism set to anything greater than 7 will be wasteful. Additional streams of execution will have no data to process.
Writing to database tables can be accomplished with the WriteToJDBC operator.
-
Constructor Summary
ConstructorsConstructorDescriptionCreates a new instance ofCopyOfReadFromJDBC.ReadFromJDBC(ReadFromJDBC other) Copy constructorReadFromJDBC(String dataQuery, RecordTokenType outputType) Creates a new instance ofCopyOfReadFromJDBC, specifying the minimal set of required parameters. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.voidConvenience method, equivalent tosetOutputType(discoverType())Discovers the output type based on this operator's configuration.Gets the configured data query.booleanGet whether the output type should be automatically discovered at runtime.intRetrieves the number of rows in each fetch from the database.getInput()Get the optional input port that contains parameter values to apply to a parameterized data query.Get the output port containing the data retrieved from the source database.Get the configured output type.Get the query used to gather values for the parameterized data query.Object[][]Get the configured parameter values.voidsetDataQuery(String dataQuery) Sets theSELECTstatement to use to query data from the source database.voidsetDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime) Set whether the output type should be automatically discovered at runtime.voidsetFetchSize(int fetchSize) Sets the number of rows in each fetch from the database.voidsetOutputType(RecordTokenType outputType) Set the record type of the output port of this operator.voidsetParameterQuery(String parameterQuery) Set the database query used to gather parameters from the source database.voidsetParameters(Object[][] parameters) Set the parameter values to use when the data query specified withsetDataQuery(String)is parameterized.voidsetSelectStatement(String selectStatement) Deprecated.Methods inherited from class com.pervasive.datarush.operators.io.jdbc.JDBCOperator
getConnectionFactory, getDriverName, getErrorAction, getHostNames, getJDBCConnector, getPassword, getSqlWarningLimit, getTableName, getUrl, getUser, setConnectionFactory, setDriverName, setErrorAction, setHostNames, setHostNames, setJDBCConnector, setPassword, setSqlWarningLimit, setTableName, setUrl, setUserMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Constructor Details
-
ReadFromJDBC
public ReadFromJDBC()Creates a new instance ofCopyOfReadFromJDBC. -
ReadFromJDBC
Creates a new instance ofCopyOfReadFromJDBC, specifying the minimal set of required parameters.- Parameters:
dataQuery- The query to be run to access dataoutputType- The output type created by the query
-
ReadFromJDBC
Copy constructor- Parameters:
other- the instance of ReadFromJDBC to copy
-
-
Method Details
-
getInput
Get the optional input port that contains parameter values to apply to a parameterized data query.- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Returns:
- optional input port for parameter values
-
getOutput
Get the output port containing the data retrieved from the source database.- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Returns:
- output record port
-
getDataQuery
Gets the configured data query.- Returns:
- SQL select statement that retrieves the wanted data
-
setDataQuery
Sets theSELECTstatement to use to query data from the source database. This may be a complex query. The query can also be parameterized. If the query is parameterized, use eithersetParameterQuery(String)orsetParameters(Object[][])to specify the parameters of the data query.- Parameters:
dataQuery- database query used to retrieve the wanted data
-
setSelectStatement
Deprecated.Use thesetDataQuery(String)method instead.Sets theSELECTstatement to use as a query. Either this method orJDBCOperator.setTableName(String)must be invoked.- Parameters:
selectStatement- theSELECTstatement to use as a query
-
getParameterQuery
Get the query used to gather values for the parameterized data query.- Returns:
- query used to gather parameter values
-
setParameterQuery
Set the database query used to gather parameters from the source database. This query is run to build the parameters for a parameterized data query. The number of columns returned from this query must match the number of parameters required by the data query (set usingsetDataQuery(String)).When using a parameterized query, either the parameter query should be set using this method or set the parameters directly with
setParameters(Object[][]).- Parameters:
parameterQuery- query used to gather data query parameter values
-
getOutputType
Get the configured output type.- Returns:
- output type of the output port for this operator
-
setOutputType
Set the record type of the output port of this operator. If this type is not specified directly it will be discovered. The efficiency of the discovery capability depends on the database driver implementation. It is always more efficient to specify the output type when possible.The output type can be discovered using
discoverType()or configured directly usingconfigureOutputType().- Parameters:
outputType- record token type for the output port
-
getParameters
Get the configured parameter values.- Returns:
- parameter values
-
setParameters
Set the parameter values to use when the data query specified withsetDataQuery(String)is parameterized. The parameters are specified as an array ofObjectarrays. The size of theObjectarrays must match the number of parameters in the data query. The number ofObjectarrays affects the parallelization that may be achieved by the operator. Setting the parallelization of the application running this operator to a number greater than the number of parameter sets is inefficient.When using a parameterized data query, either pass in the parameter values using this method or set a query to gather the parameter values using
setParameterQuery(String).- Parameters:
parameters- parameter values for a parameterized query
-
getFetchSize
public int getFetchSize()Retrieves the number of rows in each fetch from the database.- Returns:
- the number of rows in each fetch from the database
-
setFetchSize
public void setFetchSize(int fetchSize) Sets the number of rows in each fetch from the database.- Parameters:
fetchSize- the number of rows in each fetch from the database
-
getDiscoverOutputAtRuntime
public boolean getDiscoverOutputAtRuntime()Get whether the output type should be automatically discovered at runtime.- Returns:
- whether the output is discovered at runtime
-
setDiscoverOutputAtRuntime
public void setDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime) Set whether the output type should be automatically discovered at runtime. Generally this should only be used when calling stored procedures that dynamically generate the output type during execution. In this case the correct output type cannot be discovered without executing the full query during composition. This can potentially decrease overall performance and should be used with caution.- Parameters:
discoverOutputAtRuntime- if the operator output will be discovered at runtime
-
configureOutputType
public void configureOutputType()Convenience method, equivalent tosetOutputType(discoverType()) -
discoverType
Discovers the output type based on this operator's configuration.- Returns:
- the discovered output type
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-
setDataQuery(String)method instead.