- java.lang.Object
-
- com.pervasive.datarush.operators.io.avro.AvroSchemaUtils
-
public class AvroSchemaUtils extends Object
Utilities for working with Avro schemas. Contain methods which can map between DataRush and Avro types, as well as methods useful for extracting information about Avro encodings.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
cleanseName(String fieldName)
Cleanses the specified name so it is a valid field name in Avro.static RecordTokenType
determineType(org.apache.avro.Schema schema)
Maps an Avro schema to a DataRush record type.static org.apache.avro.Schema
generateSchema(RecordTokenType type)
Creates an Avro schema from the given DataRush record type.static boolean
isNullable(org.apache.avro.Schema schema)
Indicates whether the specified Avro schema supports setting null values.static boolean
isWritable(ScalarTokenType type, org.apache.avro.Schema schema)
Indicates whether the specified DataRush type can be encoded in the given schema.
-
-
-
Method Detail
-
cleanseName
public static String cleanseName(String fieldName)
Cleanses the specified name so it is a valid field name in Avro. Valid field names:- Start with an underscore or alphabetic character.
- Contain only underscores and alphanumeric characters.
- Parameters:
fieldName
- the name to cleanse- Returns:
- a name valid for use in Avro
-
generateSchema
public static org.apache.avro.Schema generateSchema(RecordTokenType type)
Creates an Avro schema from the given DataRush record type.The generated schema is an Avro RECORD consisting of fields in the the same order as the record, having the same names. Field names are cleansed to be valid Avro field names using
cleanseName(String)
. If this cleansing results in a name collision, an error is raised. Each field in the generated schema will have a UNION type including NULL and the appropriate Avro schema type based on the input type as listed below:- BOOLEAN, DOUBLE, FLOAT, LONG, and INT are mapped to the Avro primitive type of the same name.
- STRING is mapped differently based on the presence of a
domain
on the source field. If no domain is specified, it is mapped to the STRING primitive type. If a domain is specified, it is mapped to an ENUM having the same set of symbols as the domain. - BINARY is mapped to the BYTES primitive type.
- NUMERIC is mapped to the DOUBLE primitive type; this may result in loss of precision.
- CHAR is mapped to the STRING primitive type.
- DATE is mapped to a nested RECORD having one field epochDays of type LONG. The value of this field is
the same as
DateValued#asEpochDays()
. - TIME is mapped to a nested RECORD having one field dayMillis of type INT. The value of this field is
the same as
TimeValued#asDayMillis()
. - TIMESTAMP is mapped to a nested RECORD having three fields: epochSecs of type LONG, subsecNanos of type INT,
and offsetSecs of type INT. The value of these fields the same as those of the values with the same names
in
TimestampValued
.
- Parameters:
type
- the for which to generate a schema- Returns:
- an Avro schema describing the given type
-
isWritable
public static boolean isWritable(ScalarTokenType type, org.apache.avro.Schema schema)
Indicates whether the specified DataRush type can be encoded in the given schema.- Parameters:
type
- the field type to checkschema
- the target schema for the field- Returns:
true
if the target schema permits values of the specified type to be written (excluding consideration of null values),false
otherwise.
-
determineType
public static RecordTokenType determineType(org.apache.avro.Schema schema)
Maps an Avro schema to a DataRush record type.The provided schema will be converted to a record type having fields of the same name and appearing in the same order. If the schema is not of RECORD type, it will be treated as if it were a single field name "field0" in a records.
Fields with primitive Avro types are mapped to DataRush as indicated in the table below:
Source Avro Type Target DataRush Type BOOLEAN BOOLEAN BYTES BINARY DOUBLE DOUBLE FIXED BINARY FLOAT FLOAT LONG LONG INT INT STRING STRING For complex Avro datatypes, the mapping to DataRush is as follows:
- RECORD data in Avro will, in general, be mapped to a DataRush record type as long as each field
can be mapped to a
scalar type
. Nested records are not currently allowed except for the Avro RECORD representations of DataRush DATE, TIME, and TIMESTAMP types as described in theWriteAvro
operator. - ENUM data in Avro will be mapped to the DataRush string type, setting the
domain
to the enumerated list of symbols. - UNION data in Avro can be mapped only if it a union of NULL and exactly one other type which can be mapped to a scalar type.
- ARRAY and MAP data in Avro is not currently supported.
- Parameters:
schema
- the schema for which to determine the equivalent record type- Returns:
- a record type describing the data represented by the schema
- Throws:
DRException
- if the schema cannot be converted to a record type
- RECORD data in Avro will, in general, be mapped to a DataRush record type as long as each field
can be mapped to a
-
isNullable
public static boolean isNullable(org.apache.avro.Schema schema)
Indicates whether the specified Avro schema supports setting null values. That is, whether the schema is a UNION and has one branch of type NULL.- Parameters:
schema
- the schema to check- Returns:
true
if a null value can be written to the schema,false
otherwise.
-
-