- All Implemented Interfaces:
LogicalOperator
- Direct Known Subclasses:
ReadHBase,WriteHBase
A DataRush Field can be mapped to a select HBase cell or it can be mapped to all cells in a family as a sub table:
1. mapCell(java.lang.String, java.lang.String, java.lang.String) - Map a select cell within a column family as a field. Cells within a family
can be of heterogeneous type and only the mapped cells are accessed. All mapped cell fields
will be together in a single record. Optional row key, and time key fields can be specified to uniquely id each DataRush record.
2. mapFamily(java.lang.String, java.lang.String) - Map all cells within a column family as a sub table of fields. All cells within a family
are of homogeneous type and all cells are accessed. Each cell within a family is contained in an individual record.
Optional row key, qualifier key, and time key fields can be specified to uniquely id each DataRush record.
Mapping cells such that each cell contains a single field will serialize/deserialize the following DataRush data types using common HBase formats:
- TokenTypeConstant.INT - org.apache.hadoop.hbase.util.Bytes.toBytes(int)
- TokenTypeConstant.LONG - org.apache.hadoop.hbase.util.Bytes.toBytes(long)
- TokenTypeConstant.FLOAT - org.apache.hadoop.hbase.util.Bytes.toBytes(float)
- TokenTypeConstant.DOUBLE - org.apache.hadoop.hbase.util.Bytes.toBytes(double)
- TokenTypeConstant.NUMERIC - org.apache.hadoop.hbase.util.Bytes.toBytes(BigDecimal)
- TokenTypeConstant.STRING - org.apache.hadoop.hbase.util.Bytes.toBytes(String)
- TokenTypeConstant.BINARY - store/retrieve byte array as is
- All other data types will be serialized/deserialized using the default DataRush formats.
An HBase cell can also be mapped as a record of fields:
1. mapCellRecord(java.lang.String, java.lang.String, java.util.Map<java.lang.String, java.lang.String>) - Map a select cell within a column family as a record. Cells within a family
can be of heterogeneous type and only the mapped cells are accessed. All mapped cell fields
will be together in a single record.Optional row key, and time key fields can be specified to uniquely id each DataRush record.
2. mapFamilyRecord(java.lang.String, java.util.Map<java.lang.String, java.lang.String>) - Map all cells within a column family as a sub table of records. All cells within a family
are of homogeneous type and all cells are accessed. Each cell within a family is contained in an individual record.
Optional row key, qualifier key, and time key fields can be specified to uniquely id each DataRush record.
Mapping a single HBase cell as a record of fields will allow multiple fields to be packed into a single cell thus greatly increasing io performance at the expense of reduced version granularity. All fields packed together in a single cell are versioned together and therefore all fields must be present when writing. The default DataRush serialization is used exclusively in this case.
DataRush schema is persisted in HBase for any cell that DataRush writes to.
If a DataRush schema exists:
- DataRush read operations will return requested keys and/or fields as schema data type.
- DataRush write operations will check for type compatibility.
If a DataRush schema does not exist:
- a cell can only be mapped as a single field for reading.
- DataRush read operations will return the keys and/or fields as TokenTypeConstant.BINARY.
- DataRush write operations will create a schema upon first write.
- See Also:
-
Field Summary
Fields inherited from class com.pervasive.datarush.hbase.KeyOperator
catalogTableName, cellSchemaFamilyName, keySchemaFamilyName, statsFamilyName -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionGet the HBase cell to field mapping.Get the column family to field mapping.Get the HCatalog database from which to retrieve a schema and mapping.Get the HCatalog fields to read or write.Get the HCatalog table from which to retrieve a schema and mapping.voidMap Cell as a Field.voidDeprecated.voidMap Cell as a Record.voidMap family as a sub-table of fields.voidDeprecated.voidmapFamilyRecord(String familyName, Map<String, String> fieldMap) Map family as a sub-table of records.protected com.pervasive.datarush.hadoop.shims.hbase.TableSchemaLoad a mapping from an existing HCatalog table.protected voidmapToHCatalog(MetadataContext ctx, RecordPort input) Write a mapping to a new HCatalog table.protected booleanDetermines if the current schema can be written to HCatalog.voidSet the HBase cell to field mapping.voidSet the column family to field mapping.voidsetHCatalogDatabase(String database) Set the HCatalog database from which to retrieve a schema and mapping.voidsetHCatalogFields(String... fields) Set the HCatalog fields to read or write.voidsetHCatalogFields(List<String> fields) Set the HCatalog fields to read or write.voidsetHCatalogTable(String table) Set the HCatalog table from which to retrieve a schema and mapping.protected booleanDetermines if the currently selected HCatalog table already exists.Methods inherited from class com.pervasive.datarush.hbase.KeyOperator
addFamily, effectiveConfiguration, getConfiguration, getFamilies, getFilesystem, getHiveMetastore, getQualifierFieldMap, getRootDirectory, getRowFieldMap, getTableName, getTimeFieldName, getZookeeperParentZNode, getZookeeperPort, getZookeeperQuorum, mapQualifier, mapRow, mapRowRecord, setConfiguration, setFamilies, setFilesystem, setHiveMetastore, setQualifierFieldMap, setRootDirectory, setRowFieldMap, setTableName, setTimeFieldName, setZookeeperParentZNode, setZookeeperPort, setZookeeperQuorumMethods inherited from class com.pervasive.datarush.operators.CompositeOperator
composeMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Constructor Details
-
KeyValueOperator
public KeyValueOperator()
-
-
Method Details
-
getFamilyFieldMap
Get the column family to field mapping.- Returns:
- column family to field mapping
-
setFamilyFieldMap
Set the column family to field mapping.- Parameters:
familyFieldMap- column family field mapping
-
getCellFieldMap
Get the HBase cell to field mapping.- Returns:
- cell to field mapping
-
setCellFieldMap
Set the HBase cell to field mapping.- Parameters:
cellFieldMap- cell to field mapping
-
mapCellRecord
Map Cell as a Record. This method is used to map a select Cell as a Record. Any previous cell mapping for the specified family, qualifier is replaced.- Parameters:
familyName- the name of the HBase column family.qualifier- the HBase column qualifier identifying the cell to be mapped as a Record.fieldMap- cell record schema name to DataRush field name map.
-
mapCell
Map Cell as a Field. This method is used to map a select cell as a field. Any previous cell mapping for the specified family, qualifier is replaced.- Parameters:
familyName- the name of the HBase column family.qualifier- the HBase column qualifier identifying the cell to be mapped as a Field.fieldName- the field name to be mapped to the specified cell qualifier.
-
mapCell
@Deprecated public void mapCell(String familyName, String qualifier, List<String> schemaNames, List<String> fieldNames) Deprecated. -
mapFamilyRecord
Map family as a sub-table of records. This method is used to map all cells within a family as a sub-table of records. Any previous mapping for the specified family is replaced.- Parameters:
familyName- the name of the HBase column family.fieldMap- cell record schema name to DataRush field name map.
-
mapFamily
Map family as a sub-table of fields. This method is used to map all cells within a family as a sub-table of fields. Any previous mapping for the specified family is replaced.- Parameters:
familyName- the name of the HBase column family.fieldName- the field name to be mapped to the specified family.
-
mapFamily
@Deprecated public void mapFamily(String familyName, List<String> schemaNames, List<String> fieldNames) Deprecated. -
getHCatalogDatabase
Get the HCatalog database from which to retrieve a schema and mapping.- Returns:
- the HCatalog database
-
setHCatalogDatabase
Set the HCatalog database from which to retrieve a schema and mapping.- Parameters:
database- the HCatalog database
-
getHCatalogTable
Get the HCatalog table from which to retrieve a schema and mapping.- Returns:
- the HCatalog table
-
setHCatalogTable
Set the HCatalog table from which to retrieve a schema and mapping.- Parameters:
table- the HCatalog table
-
getHCatalogFields
Get the HCatalog fields to read or write.- Returns:
- the a list of field names, or null if not set
-
setHCatalogFields
Set the HCatalog fields to read or write. If not set, all defined fields will be read or written.- Parameters:
fields- a list of field names
-
setHCatalogFields
Set the HCatalog fields to read or write. If not set, all defined fields will be read or written.- Parameters:
fields- a list of field names
-
mapFromHCatalog
protected com.pervasive.datarush.hadoop.shims.hbase.TableSchema mapFromHCatalog(MetadataContext ctx) Load a mapping from an existing HCatalog table.- Parameters:
ctx-
-
mapToHCatalog
Write a mapping to a new HCatalog table.- Parameters:
ctx-
-
tableExistsInHCatalog
Determines if the currently selected HCatalog table already exists.- Parameters:
ctx-- Returns:
trueif the table exists,falseotherwise
-
schemaSupportedByHCatalog
Determines if the current schema can be written to HCatalog.A schema is supported by HCatalog if its only mappings are cells to fields.
- Parameters:
ctx-- Returns:
true
-