public class ReadFromJDBC extends JDBCOperator implements RecordPipelineOperator
ReadFromJDBC
operator is used to access relational database systems using a supplied JDBC driver.
The JDBC driver must be in the classpath of the DataRush application. Each database provides a JDBC
driver implementation that can be used by DataRush to access data in the database. Reference the
specific database to be accessed for driver specific information.
The ReadFromJDBC
operator can be used to read all of the columns from a specific table or to execute
a provided query. The query provided can be a complex, multi-table query. Follow the syntax guidelines
of the database being queried. The results of the query will be made available to the output port of
the operator. The operator transforms the database column types to supported DataRush scalar types.
Some database-specific data types may not map well and will either be ignored or mapped to Java Object types.
The results of a database query executed through JDBC are returned via a ResultSet
object. The ResultSet
is used to iterate through the resultant rows and to access column data. The JDBC ResultSet
class does
not support multi-threaded access. Given that, the default behavior of the ReadFromJDBC
operator is to
execute in non-parallel mode when provided a query without parameters.
To execute queries in parallel (and distributed) the ReadFromJDBC
operator supports the use of
parameterized queries. JDBC supports adding parameters to a query using the "?" character. Below is an
example of a parameterized query. Note the use of the "?" character in the "where" clause.
select * from lineitem where l_shipmode = ?
When used as the data query for the ReadFromJDBC
operator, a parameterized query can be executed in parallel.
A set of parameters must be supplied to the parallel workers executing the parameterized queries.
The parameters can be supplied in one of three ways:
ReadFromJDBC
operator.ReadFromJDBC
operator ("parameters").select distinct l_shipmode from lineitem
Note that the parameter query is selecting a distinct set of values from the "lineitem" table. The values will be substituted for the "?" in the parameterized query.
The parameters are handled the same whether they are provided directly as objects, read from the input port or queried via the parameter query. For each set (row) of parameters, the following occurs:
select * from lineitem where l_shipmode = RAIL
.
When used with a parameterized query and provided query parameters, the ReadFromJDBC
operator operates
in parallel by creating multiple workers. The query parameters are distributed to the workers in round
robin fashion. The workers execute the data query after apply parameter substitution as described above.
The order of the parameter value is important. The order must match the order of the parameters (the "?")
in the data query. This is true for parameter values from the optional input port, provided as objects or
from the parameter query. The ReadFromJDBC
operator doesn't have the context to determine which parameter
values match which parameters. The ordering of the parameter values is left to the user.
When using a parameterized query, the number of parameter values provided must match the number of parameters in the query. If there is a mismatch in sizes, an exception will be raised and the operator will fail.
To obtain the best performance, the number of sets of query parameters should be greater than the configured parallelism. In our example parameter query, only 7 values are returned. In this case, having parallelism set to anything greater than 7 will be wasteful. Additional streams of execution will have no data to process.
Writing to database tables can be accomplished with the WriteToJDBC
operator.
Constructor and Description |
---|
ReadFromJDBC()
Creates a new instance of
CopyOfReadFromJDBC . |
ReadFromJDBC(ReadFromJDBC other)
Copy constructor
|
ReadFromJDBC(String dataQuery,
RecordTokenType outputType)
Creates a new instance of
CopyOfReadFromJDBC , specifying
the minimal set of required parameters. |
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
void |
configureOutputType()
Convenience method, equivalent to
setOutputType(discoverType()) |
RecordTokenType |
discoverType()
Discovers the output type based on this operator's configuration.
|
String |
getDataQuery()
Gets the configured data query.
|
boolean |
getDiscoverOutputAtRuntime()
Get whether the output type should be automatically discovered at runtime.
|
int |
getFetchSize()
Retrieves the number of rows in each fetch from the database.
|
RecordPort |
getInput()
Get the optional input port that contains parameter values to apply to
a parameterized data query.
|
RecordPort |
getOutput()
Get the output port containing the data retrieved from the source database.
|
RecordTokenType |
getOutputType()
Get the configured output type.
|
String |
getParameterQuery()
Get the query used to gather values for the parameterized data query.
|
Object[][] |
getParameters()
Get the configured parameter values.
|
void |
setDataQuery(String dataQuery)
Sets the
SELECT statement to use to query data from the source database. |
void |
setDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime)
Set whether the output type should be automatically discovered at runtime.
|
void |
setFetchSize(int fetchSize)
Sets the number of rows in each fetch from the database.
|
void |
setOutputType(RecordTokenType outputType)
Set the record type of the output port of this operator.
|
void |
setParameterQuery(String parameterQuery)
Set the database query used to gather parameters from the source database.
|
void |
setParameters(Object[][] parameters)
Set the parameter values to use when the data query specified with
setDataQuery(String)
is parameterized. |
void |
setSelectStatement(String selectStatement)
Deprecated.
Use the
setDataQuery(String) method instead. |
getConnectionFactory, getDriverName, getErrorAction, getHostNames, getJDBCConnector, getPassword, getSqlWarningLimit, getTableName, getUrl, getUser, setConnectionFactory, setDriverName, setErrorAction, setHostNames, setHostNames, setJDBCConnector, setPassword, setSqlWarningLimit, setTableName, setUrl, setUser
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public ReadFromJDBC()
CopyOfReadFromJDBC
.public ReadFromJDBC(String dataQuery, RecordTokenType outputType)
CopyOfReadFromJDBC
, specifying
the minimal set of required parameters.dataQuery
- The query to be run to access dataoutputType
- The output type created by the querypublic ReadFromJDBC(ReadFromJDBC other)
other
- the instance of ReadFromJDBC to copypublic RecordPort getInput()
getInput
in interface PipelineOperator<RecordPort>
public RecordPort getOutput()
getOutput
in interface PipelineOperator<RecordPort>
public String getDataQuery()
public void setDataQuery(String dataQuery)
SELECT
statement to use to query data from the source database.
This may be a complex query. The query can also be parameterized. If the query is
parameterized, use either setParameterQuery(String)
or setParameters(Object[][])
to specify the parameters of the data query.dataQuery
- database query used to retrieve the wanted data@Deprecated public void setSelectStatement(String selectStatement)
setDataQuery(String)
method instead.SELECT
statement to use as a query. Either this method or JDBCOperator.setTableName(String)
must be invoked.selectStatement
- the SELECT
statement to use as a querypublic String getParameterQuery()
public void setParameterQuery(String parameterQuery)
setDataQuery(String)
).
When using a parameterized query, either the parameter query should be set
using this method or set the parameters directly with setParameters(Object[][])
.
parameterQuery
- query used to gather data query parameter valuespublic RecordTokenType getOutputType()
public void setOutputType(RecordTokenType outputType)
The output type can be discovered using discoverType()
or configured
directly using configureOutputType()
.
outputType
- record token type for the output portpublic Object[][] getParameters()
public void setParameters(Object[][] parameters)
setDataQuery(String)
is parameterized. The parameters are specified as an array of Object
arrays. The size of the Object
arrays must match the number of parameters in the data query. The number of Object
arrays affects
the parallelization that may be achieved by the operator. Setting the parallelization of the
application running this operator to a number greater than the number of parameter sets is
inefficient.
When using a parameterized data query, either pass in the parameter values using this method
or set a query to gather the parameter values using setParameterQuery(String)
.
parameters
- parameter values for a parameterized querypublic int getFetchSize()
public void setFetchSize(int fetchSize)
fetchSize
- the number of rows in each fetch from the databasepublic boolean getDiscoverOutputAtRuntime()
public void setDiscoverOutputAtRuntime(boolean discoverOutputAtRuntime)
discoverOutputAtRuntime
- if the operator output will be discovered at runtimepublic void configureOutputType()
setOutputType(discoverType())
public RecordTokenType discoverType()
protected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2020 Actian Corporation. All rights reserved.