- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.jdbc.JDBCOperator
-
- com.pervasive.datarush.operators.io.jdbc.ReadFromJDBC
-
- All Implemented Interfaces:
ConnectionSettings
,LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class ReadFromJDBC extends JDBCOperator implements RecordPipelineOperator
TheReadFromJDBC
operator is used to access relational database systems using a supplied JDBC driver. The JDBC driver must be in the classpath of the DataRush application. Each database provides a JDBC driver implementation that can be used by DataRush to access data in the database. Reference the specific database to be accessed for driver specific information.The
ReadFromJDBC
operator can be used to read all of the columns from a specific table or to execute a provided query. The query provided can be a complex, multi-table query. Follow the syntax guidelines of the database being queried. The results of the query will be made available to the output port of the operator. The operator transforms the database column types to supported DataRush scalar types. Some database-specific data types may not map well and will either be ignored or mapped to Java Object types.The results of a database query executed through JDBC are returned via a
ResultSet
object. TheResultSet
is used to iterate through the resultant rows and to access column data. The JDBCResultSet
class does not support multi-threaded access. Given that, the default behavior of theReadFromJDBC
operator is to execute in non-parallel mode when provided a query without parameters.To execute queries in parallel (and distributed) the
ReadFromJDBC
operator supports the use of parameterized queries. JDBC supports adding parameters to a query using the "?" character. Below is an example of a parameterized query. Note the use of the "?" character in the "where" clause.
select * from lineitem where l_shipmode = ?
When used as the data query for the
ReadFromJDBC
operator, a parameterized query can be executed in parallel. A set of parameters must be supplied to the parallel workers executing the parameterized queries. The parameters can be supplied in one of three ways:- Via the optional input port the
ReadFromJDBC
operator. - Obtained via a parameter query supplied as a property to the operator ("parameterQuery"). The query is executed and the results are used as parameters to the parameterized query.
- An array of values is passed as a property to the
ReadFromJDBC
operator ("parameters").
select distinct l_shipmode from lineitem
Note that the parameter query is selecting a distinct set of values from the "lineitem" table. The values will be substituted for the "?" in the parameterized query.
The parameters are handled the same whether they are provided directly as objects, read from the input port or queried via the parameter query. For each set (row) of parameters, the following occurs:
- The parameters are substituted within the parameterized data query. From our example,
one of the parameter values is "RAIL". When substituted within the example data query, the resultant
query becomes
select * from lineitem where l_shipmode = RAIL
. - The query with the substituted parameters is executed against the database.
- The results of the query are streamed to the output of the operator.
When used with a parameterized query and provided query parameters, the
ReadFromJDBC
operator operates in parallel by creating multiple workers. The query parameters are distributed to the workers in round robin fashion. The workers execute the data query after apply parameter substitution as described above.The order of the parameter value is important. The order must match the order of the parameters (the "?") in the data query. This is true for parameter values from the optional input port, provided as objects or from the parameter query. The
ReadFromJDBC
operator doesn't have the context to determine which parameter values match which parameters. The ordering of the parameter values is left to the user.When using a parameterized query, the number of parameter values provided must match the number of parameters in the query. If there is a mismatch in sizes, an exception will be raised and the operator will fail.
To obtain the best performance, the number of sets of query parameters should be greater than the configured parallelism. In our example parameter query, only 7 values are returned. In this case, having parallelism set to anything greater than 7 will be wasteful. Additional streams of execution will have no data to process.
Writing to database tables can be accomplished with the
WriteToJDBC
operator.
-
-
Constructor Summary
Constructors Constructor Description ReadFromJDBC()
Creates a new instance ofCopyOfReadFromJDBC
.ReadFromJDBC(ReadFromJDBC other)
Copy constructorReadFromJDBC(String dataQuery, RecordTokenType outputType)
Creates a new instance ofCopyOfReadFromJDBC
, specifying the minimal set of required parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.void
configureOutputType()
Convenience method, equivalent tosetOutputType(discoverType())
RecordTokenType
discoverType()
Discovers the output type based on this operator's configuration.String
getDataQuery()
Gets the configured data query.boolean
getDiscoverOutputAtRuntime()
Get whether the output type should be automatically discovered at runtime.int
getFetchSize()
Retrieves the number of rows in each fetch from the database.RecordPort
getInput()
Get the optional input port that contains parameter values to apply to a parameterized data query.RecordPort
getOutput()
Get the output port containing the data retrieved from the source database.RecordTokenType
getOutputType()
Get the configured output type.String
getParameterQuery()
Get the query used to gather values for the parameterized data query.Object[][]
getParameters()
Get the configured parameter values.void
setDataQuery(String dataQuery)
Sets theSELECT
statement to use to query data from the source database.void
setDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime)
Set whether the output type should be automatically discovered at runtime.void
setFetchSize(int fetchSize)
Sets the number of rows in each fetch from the database.void
setOutputType(RecordTokenType outputType)
Set the record type of the output port of this operator.void
setParameterQuery(String parameterQuery)
Set the database query used to gather parameters from the source database.void
setParameters(Object[][] parameters)
Set the parameter values to use when the data query specified withsetDataQuery(String)
is parameterized.void
setSelectStatement(String selectStatement)
Deprecated.Use thesetDataQuery(String)
method instead.-
Methods inherited from class com.pervasive.datarush.operators.io.jdbc.JDBCOperator
getConnectionFactory, getDriverName, getErrorAction, getHostNames, getJDBCConnector, getPassword, getSqlWarningLimit, getTableName, getUrl, getUser, setConnectionFactory, setDriverName, setErrorAction, setHostNames, setHostNames, setJDBCConnector, setPassword, setSqlWarningLimit, setTableName, setUrl, setUser
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
ReadFromJDBC
public ReadFromJDBC()
Creates a new instance ofCopyOfReadFromJDBC
.
-
ReadFromJDBC
public ReadFromJDBC(String dataQuery, RecordTokenType outputType)
Creates a new instance ofCopyOfReadFromJDBC
, specifying the minimal set of required parameters.- Parameters:
dataQuery
- The query to be run to access dataoutputType
- The output type created by the query
-
ReadFromJDBC
public ReadFromJDBC(ReadFromJDBC other)
Copy constructor- Parameters:
other
- the instance of ReadFromJDBC to copy
-
-
Method Detail
-
getInput
public RecordPort getInput()
Get the optional input port that contains parameter values to apply to a parameterized data query.- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Returns:
- optional input port for parameter values
-
getOutput
public RecordPort getOutput()
Get the output port containing the data retrieved from the source database.- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Returns:
- output record port
-
getDataQuery
public String getDataQuery()
Gets the configured data query.- Returns:
- SQL select statement that retrieves the wanted data
-
setDataQuery
public void setDataQuery(String dataQuery)
Sets theSELECT
statement to use to query data from the source database. This may be a complex query. The query can also be parameterized. If the query is parameterized, use eithersetParameterQuery(String)
orsetParameters(Object[][])
to specify the parameters of the data query.- Parameters:
dataQuery
- database query used to retrieve the wanted data
-
setSelectStatement
@Deprecated public void setSelectStatement(String selectStatement)
Deprecated.Use thesetDataQuery(String)
method instead.Sets theSELECT
statement to use as a query. Either this method orJDBCOperator.setTableName(String)
must be invoked.- Parameters:
selectStatement
- theSELECT
statement to use as a query
-
getParameterQuery
public String getParameterQuery()
Get the query used to gather values for the parameterized data query.- Returns:
- query used to gather parameter values
-
setParameterQuery
public void setParameterQuery(String parameterQuery)
Set the database query used to gather parameters from the source database. This query is run to build the parameters for a parameterized data query. The number of columns returned from this query must match the number of parameters required by the data query (set usingsetDataQuery(String)
).When using a parameterized query, either the parameter query should be set using this method or set the parameters directly with
setParameters(Object[][])
.- Parameters:
parameterQuery
- query used to gather data query parameter values
-
getOutputType
public RecordTokenType getOutputType()
Get the configured output type.- Returns:
- output type of the output port for this operator
-
setOutputType
public void setOutputType(RecordTokenType outputType)
Set the record type of the output port of this operator. If this type is not specified directly it will be discovered. The efficiency of the discovery capability depends on the database driver implementation. It is always more efficient to specify the output type when possible.The output type can be discovered using
discoverType()
or configured directly usingconfigureOutputType()
.- Parameters:
outputType
- record token type for the output port
-
getParameters
public Object[][] getParameters()
Get the configured parameter values.- Returns:
- parameter values
-
setParameters
public void setParameters(Object[][] parameters)
Set the parameter values to use when the data query specified withsetDataQuery(String)
is parameterized. The parameters are specified as an array ofObject
arrays. The size of theObject
arrays must match the number of parameters in the data query. The number ofObject
arrays affects the parallelization that may be achieved by the operator. Setting the parallelization of the application running this operator to a number greater than the number of parameter sets is inefficient.When using a parameterized data query, either pass in the parameter values using this method or set a query to gather the parameter values using
setParameterQuery(String)
.- Parameters:
parameters
- parameter values for a parameterized query
-
getFetchSize
public int getFetchSize()
Retrieves the number of rows in each fetch from the database.- Returns:
- the number of rows in each fetch from the database
-
setFetchSize
public void setFetchSize(int fetchSize)
Sets the number of rows in each fetch from the database.- Parameters:
fetchSize
- the number of rows in each fetch from the database
-
getDiscoverOutputAtRuntime
public boolean getDiscoverOutputAtRuntime()
Get whether the output type should be automatically discovered at runtime.- Returns:
- whether the output is discovered at runtime
-
setDiscoverOutputAtRuntime
public void setDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime)
Set whether the output type should be automatically discovered at runtime. Generally this should only be used when calling stored procedures that dynamically generate the output type during execution. In this case the correct output type cannot be discovered without executing the full query during composition. This can potentially decrease overall performance and should be used with caution.- Parameters:
discoverOutputAtRuntime
- if the operator output will be discovered at runtime
-
configureOutputType
public void configureOutputType()
Convenience method, equivalent tosetOutputType(discoverType())
-
discoverType
public RecordTokenType discoverType()
Discovers the output type based on this operator's configuration.- Returns:
- the discovered output type
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-