Class LoadActianVector

All Implemented Interfaces:
LogicalOperator

public final class LoadActianVector extends CompositeOperator
Bulk load data into the Actian Vector database. The data can be loaded using one of three available bulk loading methods:
  • Using the direct load capability: this is the fastest method of loading and supports cluster execution. The data is formatted locally in memory and streamed to the Actian Vector server.
  • Using the vwload utility: this method is only available on an instance where the vwload utility is available. If executing on a Vector H cluster an HDFS temporary directory must be specified. The data is first written into a temporary area to prepare for loading into the database.
  • Using the copy vwload command: this option supports remote execution and allows the user to execute the vwload utility via a SQL command. This can be useful if the vwload utility is not available on the path of the target instance.
  • Using the SQL copy command: this option supports remote execution. However, the Actian Vector client must be installed and running on the machine executing the DataFlow application. The data is first written into a temporary area to prepare for loading into the database.
  • Constructor Details

    • LoadActianVector

      public LoadActianVector()
  • Method Details

    • getInputPort

      public RecordPort getInputPort()
    • getRejectPort

      public RecordPort getRejectPort()
      Gets the port providing records which failed the load. This port will only output rejected records when using Direct loading. Otherwise the port will simply output the location of log produced by the command execution.
      Returns:
      all records for which failed the load. to false or null
    • getMethod

      public LoadMethod getMethod()
      Get the load method in use.
      Returns:
      load method
    • setMethod

      public void setMethod(LoadMethod method)
      Set the load method to use.
      Parameters:
      mode - load method (vwload is the default)
    • getHost

      public String getHost()
      Get the server host name property.
      Returns:
      server host name
    • setHost

      public void setHost(String hostName)
      Set the host name property. This is the host name of the server where Actian Vector is installed.
      Parameters:
      hostName - Actian Vector server host name
    • getDatabase

      public String getDatabase()
      Get the database name.
      Returns:
      database name
    • setDatabase

      public void setDatabase(String database)
      Set the database name. Specify the database where the target table lives.
      Parameters:
      databaseName - database name
    • getInstance

      public String getInstance()
      Get the Actian Vector instance name.
      Returns:
      instance name
    • setInstance

      public void setInstance(String instance)
      Set the Actian Vector instance name. Vector defaults are "VW" and "VH".
      Parameters:
      instanceName - instance name
    • getPort

      public int getPort()
      Get the Actian Vector instance port.
      Returns:
      port number
    • setPort

      public void setPort(int port)
      Set the Actian Vector instance port. Defaults to 7.
      Parameters:
      port - port number
    • getTable

      public String getTable()
      Get the target table name.
      Returns:
      target table name
    • setTable

      public void setTable(String table)
      Set the name of the table to load.
      Parameters:
      tableName - target table name.
    • getUser

      public String getUser()
      Get the user account name.
      Returns:
      user account name
    • setUser

      public void setUser(String userName)
      Set the user name.

      When using the vwload load method, the user name is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.

      Parameters:
      userName - user account name
    • getPassword

      public String getPassword()
      Get the password.
      Returns:
      password
    • setPassword

      public void setPassword(String password)
      Set the user's password.

      When using the vwload load method, the password is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.

      Parameters:
      password -
    • getRenameMapping

      public Map<String,String> getRenameMapping()
      Get the source to target field mapping.
      Returns:
      the rename mapping
    • setRenameMapping

      public void setRenameMapping(Map<String,String> renameMapping)
      Set a rename mapping. This should be an ordered (i.e. LinkedHashMap) mapping of names. The keys in the map represent the original names in the input record port. The values in the map represent the column names in the table. If the names are the same, the mapping is not required. Any columns that are not included in the mapping will be dropped from the input.

      This is an optional property. If not provided, the input fields are mapped to the target database table by schema order.

      Parameters:
      renameMapping - the mapping from old to new names.
    • getInitializeTableSQL

      public String getInitializeTableSQL()
      Retrieves the SQL statement to execute before processing any records.
      Returns:
      the SQL statement to execute before processing any records
    • setInitializeTableSQL

      public void setInitializeTableSQL(String initializeTableSQL)
      Sets the SQL statements to execute before processing any record(s). For example, if the table does not exist, then the value for the property is required and must have a CREATE TABLE statement to create the table.

      Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.

      These statements are executed only once, regardless of #partitionCount.

      Parameters:
      initializeTableSQL - the SQL statement to execute before processing any records
    • getFinalizeTableSQL

      public String getFinalizeTableSQL()
      Retrieves the SQL statement to execute after processing all the records.
      Returns:
      the SQL statement to execute after processing all the records
    • setFinalizeTableSQL

      public void setFinalizeTableSQL(String finalizeTableSQL)
      Sets the SQL statements to execute after processing all the records. For Example, CREATE INDEX statement.

      Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.

      This statement is executed only once, regardless of #partitionCount.

      Parameters:
      finalizeTableSQL - the SQL statement to execute after processing all the records
    • getMaxErrors

      public int getMaxErrors()
      Get the maximum number of errors allowed.
      Returns:
      maximum errors
    • setMaxErrors

      public void setMaxErrors(int maxErrors)
      Set the maximum number of errors allowed per stream before rolling back the data load operation.
      Parameters:
      maxErrors - maximum errors allowed per stream
    • getRollback

      public boolean getRollback()
      Get whether rollback is enabled or disabled.
      Returns:
      rollback setting
    • setRollback

      public void setRollback(boolean enabled)
      Enable or disable rollback processing. If enabled, after the maximum number of errors allowed has been encountered, the data load will be aborted and rolled back. No new data will be inserted into the target table.

      If disabled, the operation is still aborted when the maximum number of errors has been encountered. However, the data load is not rolled back. Any data successfully loaded will appear in the target table.

      Parameters:
      rollback - enable (true) or disable (false) rollback
    • getVectorSize

      public int getVectorSize()
      Get the buffer size (in rows).
      Returns:
      buffer size
    • setVectorSize

      public void setVectorSize(int vectorSize)
      Set the size of the buffer (in rows) used to cache data before sending to the Vector engine. Defaults to 1024.
      Parameters:
      vectorSize - buffer size in rows
    • getRejectsPath

      public String getRejectsPath()
      Get the rejects path.
      Returns:
      target path name
    • setRejectsPath

      public void setRejectsPath(String rejectsPath)
      Set the rejects path. Any records of the input that fail to load into the target database will be written to a file at the given path. If no rejects are encountered, the target file will not be created. This is only applicable when not Direct loading. Otherwise the rejected records will be output on the reject port. Defaults to vwload-logs in java temp directory.
      Parameters:
      rejectsPath - target path name
    • getCleanData

      public boolean getCleanData()
      Get whether clean data is enabled or disabled.
      Returns:
      clean data setting
    • setCleanData

      public void setCleanData(boolean cleanData)
      Enable or disable data cleaning. If this is enabled additional operations and checks may be performed on the data before loading it into vector to ensure it meets any table constraints or other requirements. Additionally invalid values will be loaded as nulls if the table allows instead of producing errors, such as when stringTruncationError or decimalTruncationError are enabled.
      Parameters:
      cleanData - enable (true) or disable (false) data cleaning
    • getInsertMode

      public String getInsertMode()
      Get insert mode used during Direct loading ( DEFAULT/ROW/BULK).
      Returns:
      insertMode
    • setInsertMode

      public void setInsertMode(String insertMode)
      Set insert mode statement that applies for Insert and Merge operation when Direct loading. Default insert mode is set to DEFAULT.
      Parameters:
      insertMode - insert mode for inserts and merges.
    • getSshUser

      public String getSshUser()
      OS user id used to connect to Master Node of installation. This operator uses this user id to establish SSH connection to machine running the master node of Vector.
      Returns:
      OS UserId for authenticating the SSH connection
    • setSshUser

      public void setSshUser(String user)
      Set the OS User Id used for authenticating SSH connection to Master Node of Vector.
      Parameters:
      user -
    • setSshPassword

      public void setSshPassword(String passwd)
      Set the password for SSH user id.
      Parameters:
      password - for user id
    • getSshPassword

      public String getSshPassword()
      Password for SSH User id.
      Returns:
      password for SSH user
    • getStringTruncationError

      public boolean getStringTruncationError()
      Get whether errors on string truncation is enabled or disabled.
      Returns:
      string truncation error setting
    • setStringTruncationError

      public void setStringTruncationError(boolean enableStringTruncationError)
      Set whether errors on string truncation is enabled or disabled. This is only applicable when not Direct loading since strings are always be truncated when Direct loading.
      Parameters:
      stringTruncationError - enable(true) or disable(false) errors on string truncation
    • getDecimalTruncationError

      public boolean getDecimalTruncationError()
      Get whether errors on decimal truncation is enabled or disabled.
      Returns:
      string truncation error setting
    • setDecimalTruncationError

      public void setDecimalTruncationError(boolean enableDecimalTruncationError)
      Set whether errors on decimal truncation is enabled or disabled.
      Parameters:
      stringTruncationError - enable(true) or disable(false) errors on string truncation
    • getTmpDirectory

      public String getTmpDirectory()
      Get the temporary directory that is used for storing intermediate loader files.
      Returns:
      path to the temporary directory
    • setTmpDirectory

      public void setTmpDirectory(String tmpDirectory)
      Set the temporary directory that is to be used for storing the intermediate loader files. If not set will attempt to use the local default temporary directory. A directory must be specified that exists in the HDFS filesystem on the target Hadoop cluster when loading Vector on Hadoop or the load will not use the distributed loader.
      Parameters:
      tmpDirectory - path to the temporary directory
    • getCharset

      public String getCharset()
      Get the character set used when staging and loading for vwload method for Vectorwise operator.
      Returns:
      charset used when staging and loading data for vwload method
    • setCharset

      public void setCharset(String charsetName)
      Set the character set used for data staging and loading for vwload load method.
      Parameters:
      charsetName - character set used when staging and loading for vwload method
    • getNullIndicator

      public String getNullIndicator()
      Get the null indicator used if the loading method stages the files.
      Returns:
      nullIndicator used when staging data.
    • setNullIndicator

      public void setNullIndicator(String nullIndicator)
      Sets the null indicator used if the loading method stages the files.
      Parameters:
      nullIndicator -
    • getJdbcOnly

      public boolean getJdbcOnly()
      Get whether JDBC connections should be used instead of the command line when possible.
      Returns:
      jdbcOnly setting
    • setJdbcOnly

      public void setJdbcOnly(boolean jdbcOnly)
      Set whether JDBC connections should be used instead of the command line when possible.
      Parameters:
      jdbcOnly - enable(true) or disable(false) using JDBC exclusively if possible
    • setExtraProperties

      public void setExtraProperties(Map<String,String> extraProperties)
      Set any extra JDBC properties for the connection.
      Parameters:
      extraProperties - key value settings that will be applied to the JDBC connection
    • getExtraProperties

      public Map<String,String> getExtraProperties()
      Get any extra JDBC properties for the connection.
      Returns:
      the extra properties
    • setJdbcUrl

      public void setJdbcUrl(String jdbcUrl)
      Sets the jdbc connection url that should explicitly be used. It is not required to set this in most cases.
      Parameters:
      jdbcUrl - url used to connect to jdbc
    • getJdbcUrl

      public String getJdbcUrl()
      Gets the jdbc connection url that should be explicitly used.
      Returns:
      the jdbc url
    • compose

      protected void compose(CompositionContext context)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      context - the context