Class LoadActianVector

  • All Implemented Interfaces:
    LogicalOperator

    public final class LoadActianVector
    extends CompositeOperator
    Bulk load data into the Actian Vector database. The data can be loaded using one of three available bulk loading methods:
    • Using the direct load capability: this is the fastest method of loading and supports cluster execution. The data is formatted locally in memory and streamed to the Actian Vector server.
    • Using the vwload utility: this method is only available on an instance where the vwload utility is available. If executing on a Vector H cluster an HDFS temporary directory must be specified. The data is first written into a temporary area to prepare for loading into the database.
    • Using the copy vwload command: this option supports remote execution and allows the user to execute the vwload utility via a SQL command. This can be useful if the vwload utility is not available on the path of the target instance.
    • Using the SQL copy command: this option supports remote execution. However, the Actian Vector client must be installed and running on the machine executing the DataFlow application. The data is first written into a temporary area to prepare for loading into the database.
    • Constructor Detail

      • LoadActianVector

        public LoadActianVector()
    • Method Detail

      • getRejectPort

        public RecordPort getRejectPort()
        Gets the port providing records which failed the load. This port will only output rejected records when using Direct loading. Otherwise the port will simply output the location of log produced by the command execution.
        Returns:
        all records for which failed the load. to false or null
      • getMethod

        public LoadMethod getMethod()
        Get the load method in use.
        Returns:
        load method
      • setMethod

        public void setMethod​(LoadMethod method)
        Set the load method to use.
        Parameters:
        mode - load method (vwload is the default)
      • getHost

        public String getHost()
        Get the server host name property.
        Returns:
        server host name
      • setHost

        public void setHost​(String hostName)
        Set the host name property. This is the host name of the server where Actian Vector is installed.
        Parameters:
        hostName - Actian Vector server host name
      • getDatabase

        public String getDatabase()
        Get the database name.
        Returns:
        database name
      • setDatabase

        public void setDatabase​(String database)
        Set the database name. Specify the database where the target table lives.
        Parameters:
        databaseName - database name
      • getInstance

        public String getInstance()
        Get the Actian Vector instance name.
        Returns:
        instance name
      • setInstance

        public void setInstance​(String instance)
        Set the Actian Vector instance name. Vector defaults are "VW" and "VH".
        Parameters:
        instanceName - instance name
      • getPort

        public int getPort()
        Get the Actian Vector instance port.
        Returns:
        port number
      • setPort

        public void setPort​(int port)
        Set the Actian Vector instance port. Defaults to 7.
        Parameters:
        port - port number
      • getTable

        public String getTable()
        Get the target table name.
        Returns:
        target table name
      • setTable

        public void setTable​(String table)
        Set the name of the table to load.
        Parameters:
        tableName - target table name.
      • getUser

        public String getUser()
        Get the user account name.
        Returns:
        user account name
      • setUser

        public void setUser​(String userName)
        Set the user name.

        When using the vwload load method, the user name is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.

        Parameters:
        userName - user account name
      • getPassword

        public String getPassword()
        Get the password.
        Returns:
        password
      • setPassword

        public void setPassword​(String password)
        Set the user's password.

        When using the vwload load method, the password is not always needed. The vwload utility has to be executed by the DBA user. The user name and password supplied is for a user account that has write/copy access to the target table.

        Parameters:
        password -
      • getRenameMapping

        public Map<String,​String> getRenameMapping()
        Get the source to target field mapping.
        Returns:
        the rename mapping
      • setRenameMapping

        public void setRenameMapping​(Map<String,​String> renameMapping)
        Set a rename mapping. This should be an ordered (i.e. LinkedHashMap) mapping of names. The keys in the map represent the original names in the input record port. The values in the map represent the column names in the table. If the names are the same, the mapping is not required. Any columns that are not included in the mapping will be dropped from the input.

        This is an optional property. If not provided, the input fields are mapped to the target database table by schema order.

        Parameters:
        renameMapping - the mapping from old to new names.
      • getInitializeTableSQL

        public String getInitializeTableSQL()
        Retrieves the SQL statement to execute before processing any records.
        Returns:
        the SQL statement to execute before processing any records
      • setInitializeTableSQL

        public void setInitializeTableSQL​(String initializeTableSQL)
        Sets the SQL statements to execute before processing any record(s). For example, if the table does not exist, then the value for the property is required and must have a CREATE TABLE statement to create the table.

        Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.

        These statements are executed only once, regardless of #partitionCount.

        Parameters:
        initializeTableSQL - the SQL statement to execute before processing any records
      • getFinalizeTableSQL

        public String getFinalizeTableSQL()
        Retrieves the SQL statement to execute after processing all the records.
        Returns:
        the SQL statement to execute after processing all the records
      • setFinalizeTableSQL

        public void setFinalizeTableSQL​(String finalizeTableSQL)
        Sets the SQL statements to execute after processing all the records. For Example, CREATE INDEX statement.

        Multiple SQL CREATE, INSERT, UPDATE, DELETE or DROP statements separated by semicolon (;) can be executed.

        This statement is executed only once, regardless of #partitionCount.

        Parameters:
        finalizeTableSQL - the SQL statement to execute after processing all the records
      • getMaxErrors

        public int getMaxErrors()
        Get the maximum number of errors allowed.
        Returns:
        maximum errors
      • setMaxErrors

        public void setMaxErrors​(int maxErrors)
        Set the maximum number of errors allowed per stream before rolling back the data load operation.
        Parameters:
        maxErrors - maximum errors allowed per stream
      • getRollback

        public boolean getRollback()
        Get whether rollback is enabled or disabled.
        Returns:
        rollback setting
      • setRollback

        public void setRollback​(boolean enabled)
        Enable or disable rollback processing. If enabled, after the maximum number of errors allowed has been encountered, the data load will be aborted and rolled back. No new data will be inserted into the target table.

        If disabled, the operation is still aborted when the maximum number of errors has been encountered. However, the data load is not rolled back. Any data successfully loaded will appear in the target table.

        Parameters:
        rollback - enable (true) or disable (false) rollback
      • getVectorSize

        public int getVectorSize()
        Get the buffer size (in rows).
        Returns:
        buffer size
      • setVectorSize

        public void setVectorSize​(int vectorSize)
        Set the size of the buffer (in rows) used to cache data before sending to the Vector engine. Defaults to 1024.
        Parameters:
        vectorSize - buffer size in rows
      • getRejectsPath

        public String getRejectsPath()
        Get the rejects path.
        Returns:
        target path name
      • setRejectsPath

        public void setRejectsPath​(String rejectsPath)
        Set the rejects path. Any records of the input that fail to load into the target database will be written to a file at the given path. If no rejects are encountered, the target file will not be created. This is only applicable when not Direct loading. Otherwise the rejected records will be output on the reject port. Defaults to vwload-logs in java temp directory.
        Parameters:
        rejectsPath - target path name
      • getCleanData

        public boolean getCleanData()
        Get whether clean data is enabled or disabled.
        Returns:
        clean data setting
      • setCleanData

        public void setCleanData​(boolean cleanData)
        Enable or disable data cleaning. If this is enabled additional operations and checks may be performed on the data before loading it into vector to ensure it meets any table constraints or other requirements. Additionally invalid values will be loaded as nulls if the table allows instead of producing errors, such as when stringTruncationError or decimalTruncationError are enabled.
        Parameters:
        cleanData - enable (true) or disable (false) data cleaning
      • getInsertMode

        public String getInsertMode()
        Get insert mode used during Direct loading ( DEFAULT/ROW/BULK).
        Returns:
        insertMode
      • setInsertMode

        public void setInsertMode​(String insertMode)
        Set insert mode statement that applies for Insert and Merge operation when Direct loading. Default insert mode is set to DEFAULT.
        Parameters:
        insertMode - insert mode for inserts and merges.
      • getSshUser

        public String getSshUser()
        OS user id used to connect to Master Node of installation. This operator uses this user id to establish SSH connection to machine running the master node of Vector.
        Returns:
        OS UserId for authenticating the SSH connection
      • setSshUser

        public void setSshUser​(String user)
        Set the OS User Id used for authenticating SSH connection to Master Node of Vector.
        Parameters:
        user -
      • setSshPassword

        public void setSshPassword​(String passwd)
        Set the password for SSH user id.
        Parameters:
        password - for user id
      • getSshPassword

        public String getSshPassword()
        Password for SSH User id.
        Returns:
        password for SSH user
      • getStringTruncationError

        public boolean getStringTruncationError()
        Get whether errors on string truncation is enabled or disabled.
        Returns:
        string truncation error setting
      • setStringTruncationError

        public void setStringTruncationError​(boolean enableStringTruncationError)
        Set whether errors on string truncation is enabled or disabled. This is only applicable when not Direct loading since strings are always be truncated when Direct loading.
        Parameters:
        stringTruncationError - enable(true) or disable(false) errors on string truncation
      • getDecimalTruncationError

        public boolean getDecimalTruncationError()
        Get whether errors on decimal truncation is enabled or disabled.
        Returns:
        string truncation error setting
      • setDecimalTruncationError

        public void setDecimalTruncationError​(boolean enableDecimalTruncationError)
        Set whether errors on decimal truncation is enabled or disabled.
        Parameters:
        stringTruncationError - enable(true) or disable(false) errors on string truncation
      • getTmpDirectory

        public String getTmpDirectory()
        Get the temporary directory that is used for storing intermediate loader files.
        Returns:
        path to the temporary directory
      • setTmpDirectory

        public void setTmpDirectory​(String tmpDirectory)
        Set the temporary directory that is to be used for storing the intermediate loader files. If not set will attempt to use the local default temporary directory. A directory must be specified that exists in the HDFS filesystem on the target Hadoop cluster when loading Vector on Hadoop or the load will not use the distributed loader.
        Parameters:
        tmpDirectory - path to the temporary directory
      • getCharset

        public String getCharset()
        Get the character set used when staging and loading for vwload method for Vectorwise operator.
        Returns:
        charset used when staging and loading data for vwload method
      • setCharset

        public void setCharset​(String charsetName)
        Set the character set used for data staging and loading for vwload load method.
        Parameters:
        charsetName - character set used when staging and loading for vwload method
      • getNullIndicator

        public String getNullIndicator()
        Get the null indicator used if the loading method stages the files.
        Returns:
        nullIndicator used when staging data.
      • setNullIndicator

        public void setNullIndicator​(String nullIndicator)
        Sets the null indicator used if the loading method stages the files.
        Parameters:
        nullIndicator -
      • getJdbcOnly

        public boolean getJdbcOnly()
        Get whether JDBC connections should be used instead of the command line when possible.
        Returns:
        jdbcOnly setting
      • setJdbcOnly

        public void setJdbcOnly​(boolean jdbcOnly)
        Set whether JDBC connections should be used instead of the command line when possible.
        Parameters:
        jdbcOnly - enable(true) or disable(false) using JDBC exclusively if possible
      • setExtraProperties

        public void setExtraProperties​(Map<String,​String> extraProperties)
        Set any extra JDBC properties for the connection.
        Parameters:
        extraProperties - key value settings that will be applied to the JDBC connection
      • getExtraProperties

        public Map<String,​String> getExtraProperties()
        Get any extra JDBC properties for the connection.
        Returns:
        the extra properties
      • setJdbcUrl

        public void setJdbcUrl​(String jdbcUrl)
        Sets the jdbc connection url that should explicitly be used. It is not required to set this in most cases.
        Parameters:
        jdbcUrl - url used to connect to jdbc
      • getJdbcUrl

        public String getJdbcUrl()
        Gets the jdbc connection url that should be explicitly used.
        Returns:
        the jdbc url
      • compose

        protected void compose​(CompositionContext context)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        context - the context