Class LoadActianVector
- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.io.vectorwise.LoadActianVector
-
- All Implemented Interfaces:
LogicalOperator
public final class LoadActianVector extends CompositeOperator
Bulk load data into the Actian Vector database. The data can be loaded using one of three available bulk loading methods:- Using the direct load capability: this is the fastest method of loading and supports cluster execution. The data is formatted locally in memory and streamed to the Actian Vector server.
- Using the vwload utility: this method is only available on an instance where the vwload utility is available. If executing on a Vector H cluster an HDFS temporary directory must be specified. The data is first written into a temporary area to prepare for loading into the database.
- Using the copy vwload command: this option supports remote execution and allows the user to execute the vwload utility via a SQL command. This can be useful if the vwload utility is not available on the path of the target instance.
- Using the SQL copy command: this option supports remote execution. However, the Actian Vector client must be installed and running on the machine executing the DataFlow application. The data is first written into a temporary area to prepare for loading into the database.
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_VECTOR_H_INSTANCE
static String
DEFAULT_VECTOR_INSTANCE
static int
DEFAULT_VECTOR_PORT
static int
DEFAULT_VECTOR_SIZE
-
Constructor Summary
Constructors Constructor Description LoadActianVector()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext context)
Compose the body of this operator.String
getCharset()
Get the character set used when staging and loading for vwload method for Vectorwise operator.boolean
getCleanData()
Get whether clean data is enabled or disabled.String
getDatabase()
Get the database name.boolean
getDecimalTruncationError()
Get whether errors on decimal truncation is enabled or disabled.Map<String,String>
getExtraProperties()
Get any extra JDBC properties for the connection.String
getFinalizeTableSQL()
Retrieves the SQL statement to execute after processing all the records.String
getHost()
Get the server host name property.String
getInitializeTableSQL()
Retrieves the SQL statement to execute before processing any records.RecordPort
getInputPort()
String
getInsertMode()
Get insert mode used during Direct loading ( DEFAULT/ROW/BULK).String
getInstance()
Get the Actian Vector instance name.boolean
getJdbcOnly()
Get whether JDBC connections should be used instead of the command line when possible.String
getJdbcUrl()
Gets the jdbc connection url that should be explicitly used.int
getMaxErrors()
Get the maximum number of errors allowed.LoadMethod
getMethod()
Get the load method in use.String
getNullIndicator()
Get the null indicator used if the loading method stages the files.String
getPassword()
Get the password.int
getPort()
Get the Actian Vector instance port.RecordPort
getRejectPort()
Gets the port providing records which failed the load.String
getRejectsPath()
Get the rejects path.Map<String,String>
getRenameMapping()
Get the source to target field mapping.boolean
getRollback()
Get whether rollback is enabled or disabled.String
getSshPassword()
Password for SSH User id.String
getSshUser()
OS user id used to connect to Master Node of installation.boolean
getStringTruncationError()
Get whether errors on string truncation is enabled or disabled.String
getTable()
Get the target table name.String
getTmpDirectory()
Get the temporary directory that is used for storing intermediate loader files.String
getUser()
Get the user account name.int
getVectorSize()
Get the buffer size (in rows).void
setCharset(String charsetName)
Set the character set used for data staging and loading for vwload load method.void
setCleanData(boolean cleanData)
Enable or disable data cleaning.void
setDatabase(String database)
Set the database name.void
setDecimalTruncationError(boolean enableDecimalTruncationError)
Set whether errors on decimal truncation is enabled or disabled.void
setExtraProperties(Map<String,String> extraProperties)
Set any extra JDBC properties for the connection.void
setFinalizeTableSQL(String finalizeTableSQL)
Sets the SQL statements to execute after processing all the records.void
setHost(String hostName)
Set the host name property.void
setInitializeTableSQL(String initializeTableSQL)
Sets the SQL statements to execute before processing any record(s).void
setInsertMode(String insertMode)
Set insert mode statement that applies for Insert and Merge operation when Direct loading.void
setInstance(String instance)
Set the Actian Vector instance name.void
setJdbcOnly(boolean jdbcOnly)
Set whether JDBC connections should be used instead of the command line when possible.void
setJdbcUrl(String jdbcUrl)
Sets the jdbc connection url that should explicitly be used.void
setMaxErrors(int maxErrors)
Set the maximum number of errors allowed per stream before rolling back the data load operation.void
setMethod(LoadMethod method)
Set the load method to use.void
setNullIndicator(String nullIndicator)
Sets the null indicator used if the loading method stages the files.void
setPassword(String password)
Set the user's password.void
setPort(int port)
Set the Actian Vector instance port.void
setRejectsPath(String rejectsPath)
Set the rejects path.void
setRenameMapping(Map<String,String> renameMapping)
Set a rename mapping.void
setRollback(boolean enabled)
Enable or disable rollback processing.void
setSshPassword(String passwd)
Set the password for SSH user id.void
setSshUser(String user)
Set the OS User Id used for authenticating SSH connection to Master Node of Vector.void
setStringTruncationError(boolean enableStringTruncationError)
Set whether errors on string truncation is enabled or disabled.void
setTable(String table)
Set the name of the table to load.void
setTmpDirectory(String tmpDirectory)
Set the temporary directory that is to be used for storing the intermediate loader files.void
setUser(String userName)
Set the user name.void
setVectorSize(int vectorSize)
Set the size of the buffer (in rows) used to cache data before sending to the Vector engine.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
-
-
-
Field Detail
-
DEFAULT_VECTOR_SIZE
public static final int DEFAULT_VECTOR_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_VECTOR_PORT
public static final int DEFAULT_VECTOR_PORT
- See Also:
- Constant Field Values
-
DEFAULT_VECTOR_INSTANCE
public static final String DEFAULT_VECTOR_INSTANCE
- See Also:
- Constant Field Values
-
DEFAULT_VECTOR_H_INSTANCE
public static final String DEFAULT_VECTOR_H_INSTANCE
- See Also:
- Constant Field Values
-
-
Method Detail
-
getInputPort
public RecordPort getInputPort()
-
getRejectPort
public RecordPort getRejectPort()
Gets the port providing records which failed the load. This port will only output rejected records when using Direct loading. Otherwise the port will simply output the location of log produced by the command execution.- Returns:
- all records for which failed the load.
to
false
ornull
-
getMethod
public LoadMethod getMethod()
Get the load method in use.- Returns:
- load method
-
setMethod
public void setMethod(LoadMethod method)
Set the load method to use.- Parameters:
mode
- load method (vwload is the default)
-
getHost
public String getHost()
Get the server host name property.- Returns:
- server host name
-
setHost
public void setHost(String hostName)
Set the host name property. This is the host name of the server where Actian Vector is installed.- Parameters:
hostName
- Actian Vector server host name
-
getDatabase
public String getDatabase()
Get the database name.- Returns:
- database name
-
setDatabase
public void setDatabase(String database)
Set the database name. Specify the database where the target table lives.- Parameters:
databaseName
- database name
-
getInstance
public String getInstance()
Get the Actian Vector instance name.- Returns:
- instance name
-
setInstance
public void setInstance(String instance)
Set the Actian Vector instance name. Vector defaults are "VW" and "VH".- Parameters:
instanceName
- instance name
-
getPort
public int getPort()
Get the Actian Vector instance port.- Returns:
- port number
-
setPort
public void setPort(int port)
Set the Actian Vector instance port. Defaults to 7.- Parameters:
port
- port number
-
getTable
public String getTable()
Get the target table name.- Returns:
- target table name
-
setTable
public void setTable(String table)
Set the name of the table to load.- Parameters:
tableName
- target table name.
-
getUser
public String getUser()
Get the user account name.- Returns:
- user account name
-
setUser
public void setUser(String userName)
Set the user name.When using the vwload load method, the user name is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.
- Parameters:
userName
- user account name
-
getPassword
public String getPassword()
Get the password.- Returns:
- password
-
setPassword
public void setPassword(String password)
Set the user's password.When using the vwload load method, the password is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.
- Parameters:
password
-
-
getRenameMapping
public Map<String,String> getRenameMapping()
Get the source to target field mapping.- Returns:
- the rename mapping
-
setRenameMapping
public void setRenameMapping(Map<String,String> renameMapping)
Set a rename mapping. This should be an ordered (i.e. LinkedHashMap) mapping of names. The keys in the map represent the original names in the input record port. The values in the map represent the column names in the table. If the names are the same, the mapping is not required. Any columns that are not included in the mapping will be dropped from the input.This is an optional property. If not provided, the input fields are mapped to the target database table by schema order.
- Parameters:
renameMapping
- the mapping from old to new names.
-
getInitializeTableSQL
public String getInitializeTableSQL()
Retrieves the SQL statement to execute before processing any records.- Returns:
- the SQL statement to execute before processing any records
-
setInitializeTableSQL
public void setInitializeTableSQL(String initializeTableSQL)
Sets the SQL statements to execute before processing any record(s). For example, if the table does not exist, then the value for the property is required and must have aÂÂCREATE TABLE
statement to create the table.Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.
These statements are executed only once, regardless of
#partitionCount
.- Parameters:
initializeTableSQL
- the SQL statement to execute before processing any records
-
getFinalizeTableSQL
public String getFinalizeTableSQL()
Retrieves the SQL statement to execute after processing all the records.- Returns:
- the SQL statement to execute after processing all the records
-
setFinalizeTableSQL
public void setFinalizeTableSQL(String finalizeTableSQL)
Sets the SQL statements to execute after processing all the records. For Example,CREATE INDEX
statement.Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.
This statement is executed only once, regardless of
#partitionCount
.- Parameters:
finalizeTableSQL
- the SQL statement to execute after processing all the records
-
getMaxErrors
public int getMaxErrors()
Get the maximum number of errors allowed.- Returns:
- maximum errors
-
setMaxErrors
public void setMaxErrors(int maxErrors)
Set the maximum number of errors allowed per stream before rolling back the data load operation.- Parameters:
maxErrors
- maximum errors allowed per stream
-
getRollback
public boolean getRollback()
Get whether rollback is enabled or disabled.- Returns:
- rollback setting
-
setRollback
public void setRollback(boolean enabled)
Enable or disable rollback processing. If enabled, after the maximum number of errors allowed has been encountered, the data load will be aborted and rolled back. No new data will be inserted into the target table.If disabled, the operation is still aborted when the maximum number of errors has been encountered. However, the data load is not rolled back. Any data successfully loaded will appear in the target table.
- Parameters:
rollback
- enable (true) or disable (false) rollback
-
getVectorSize
public int getVectorSize()
Get the buffer size (in rows).- Returns:
- buffer size
-
setVectorSize
public void setVectorSize(int vectorSize)
Set the size of the buffer (in rows) used to cache data before sending to the Vector engine. Defaults to 1024.- Parameters:
vectorSize
- buffer size in rows
-
getRejectsPath
public String getRejectsPath()
Get the rejects path.- Returns:
- target path name
-
setRejectsPath
public void setRejectsPath(String rejectsPath)
Set the rejects path. Any records of the input that fail to load into the target database will be written to a file at the given path. If no rejects are encountered, the target file will not be created. This is only applicable when not Direct loading. Otherwise the rejected records will be output on the reject port. Defaults to vwload-logs in java temp directory.- Parameters:
rejectsPath
- target path name
-
getCleanData
public boolean getCleanData()
Get whether clean data is enabled or disabled.- Returns:
- clean data setting
-
setCleanData
public void setCleanData(boolean cleanData)
Enable or disable data cleaning. If this is enabled additional operations and checks may be performed on the data before loading it into vector to ensure it meets any table constraints or other requirements. Additionally invalid values will be loaded as nulls if the table allows instead of producing errors, such as when stringTruncationError or decimalTruncationError are enabled.- Parameters:
cleanData
- enable (true) or disable (false) data cleaning
-
getInsertMode
public String getInsertMode()
Get insert mode used during Direct loading ( DEFAULT/ROW/BULK).- Returns:
- insertMode
-
setInsertMode
public void setInsertMode(String insertMode)
Set insert mode statement that applies for Insert and Merge operation when Direct loading. Default insert mode is set to DEFAULT.- Parameters:
insertMode
- insert mode for inserts and merges.
-
getSshUser
public String getSshUser()
OS user id used to connect to Master Node of installation. This operator uses this user id to establish SSH connection to machine running the master node of Vector.- Returns:
- OS UserId for authenticating the SSH connection
-
setSshUser
public void setSshUser(String user)
Set the OS User Id used for authenticating SSH connection to Master Node of Vector.- Parameters:
user
-
-
setSshPassword
public void setSshPassword(String passwd)
Set the password for SSH user id.- Parameters:
password
- for user id
-
getSshPassword
public String getSshPassword()
Password for SSH User id.- Returns:
- password for SSH user
-
getStringTruncationError
public boolean getStringTruncationError()
Get whether errors on string truncation is enabled or disabled.- Returns:
- string truncation error setting
-
setStringTruncationError
public void setStringTruncationError(boolean enableStringTruncationError)
Set whether errors on string truncation is enabled or disabled. This is only applicable when not Direct loading since strings are always be truncated when Direct loading.- Parameters:
stringTruncationError
- enable(true) or disable(false) errors on string truncation
-
getDecimalTruncationError
public boolean getDecimalTruncationError()
Get whether errors on decimal truncation is enabled or disabled.- Returns:
- string truncation error setting
-
setDecimalTruncationError
public void setDecimalTruncationError(boolean enableDecimalTruncationError)
Set whether errors on decimal truncation is enabled or disabled.- Parameters:
stringTruncationError
- enable(true) or disable(false) errors on string truncation
-
getTmpDirectory
public String getTmpDirectory()
Get the temporary directory that is used for storing intermediate loader files.- Returns:
- path to the temporary directory
-
setTmpDirectory
public void setTmpDirectory(String tmpDirectory)
Set the temporary directory that is to be used for storing the intermediate loader files. If not set will attempt to use the local default temporary directory. A directory must be specified that exists in the HDFS filesystem on the target Hadoop cluster when loading Vector on Hadoop or the load will not use the distributed loader.- Parameters:
tmpDirectory
- path to the temporary directory
-
getCharset
public String getCharset()
Get the character set used when staging and loading for vwload method for Vectorwise operator.- Returns:
- charset used when staging and loading data for vwload method
-
setCharset
public void setCharset(String charsetName)
Set the character set used for data staging and loading for vwload load method.- Parameters:
charsetName
- character set used when staging and loading for vwload method
-
getNullIndicator
public String getNullIndicator()
Get the null indicator used if the loading method stages the files.- Returns:
- nullIndicator used when staging data.
-
setNullIndicator
public void setNullIndicator(String nullIndicator)
Sets the null indicator used if the loading method stages the files.- Parameters:
nullIndicator
-
-
getJdbcOnly
public boolean getJdbcOnly()
Get whether JDBC connections should be used instead of the command line when possible.- Returns:
- jdbcOnly setting
-
setJdbcOnly
public void setJdbcOnly(boolean jdbcOnly)
Set whether JDBC connections should be used instead of the command line when possible.- Parameters:
jdbcOnly
- enable(true) or disable(false) using JDBC exclusively if possible
-
setExtraProperties
public void setExtraProperties(Map<String,String> extraProperties)
Set any extra JDBC properties for the connection.- Parameters:
extraProperties
- key value settings that will be applied to the JDBC connection
-
getExtraProperties
public Map<String,String> getExtraProperties()
Get any extra JDBC properties for the connection.- Returns:
- the extra properties
-
setJdbcUrl
public void setJdbcUrl(String jdbcUrl)
Sets the jdbc connection url that should explicitly be used. It is not required to set this in most cases.- Parameters:
jdbcUrl
- url used to connect to jdbc
-
getJdbcUrl
public String getJdbcUrl()
Gets the jdbc connection url that should be explicitly used.- Returns:
- the jdbc url
-
compose
protected void compose(CompositionContext context)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
context
- the context
-
-