Class TypeUtil

java.lang.Object
com.pervasive.datarush.types.TypeUtil

public class TypeUtil extends Object
Various utilities for manipulating token types.
  • Method Details

    • getRecordType

      public static <T extends ScalarTyped & Named> RecordTokenType getRecordType(T... fields)
      Computes the record type which a composite of the specified fields would have.
      Parameters:
      fields - the objects representing the fields of the composite
      Returns:
      the composite's record type
    • validateFieldSelection

      public static final RecordTokenType validateFieldSelection(String propertyName, RecordTokenType source, String... selected)
      Validates a set of field names against the source type, computing the resulting type.
      Parameters:
      propertyName - the property name to use in the validation failure exception
      source - the record type to check
      selected - the field names to validate, in the desired ordering for the new type
      Returns:
      the type filtered by the selected fields
      Throws:
      com.pervasive.datarush.graphs.physical.InvalidPropertyValueException - if one or more of the specified fields do not exist in the source type
    • validateRequiredFields

      public static final void validateRequiredFields(String propertyName, RecordTokenType source, RecordTokenType required)
      Validates a set of Fields against the source type ensuring the required field names are present and of the same type.
      Parameters:
      propertyName - the property name to use in the validation failure exception
      source - the record type to check
      required - the fields to validate
      Throws:
      com.pervasive.datarush.graphs.physical.InvalidPropertyValueException - if one or more of the specified fields do not exist in the source type
    • select

      public static final RecordTokenType select(RecordTokenType source, String... selected)
      Creates a new record type containing only fields from the source type which match one of the specified field names. The order of the fields in the new type match the order in which the names were specified.

      The specified fields must be valid names in the source type.

      Parameters:
      source - the record type to filter
      selected - the field names to keep in the new type, in the desired ordering for the new type
      Returns:
      the resulting filtered record type
      Throws:
      InvalidFieldException - if one or more of the specified fields do not exist in the source type
    • toType

      public static final RecordTokenType toType(DataRepresentation representation, Namespace<ScalarTokenType> namespace)
      Converts from a list of fields to a record token type.
      Parameters:
      representation - the representation
      namespace - the list of fields
      Returns:
      a newly constructed type
    • select

      public static final RecordTokenType select(RecordTokenType source, List<String> selected)
      Creates a new record type containing only fields from the source type which match one of the specified field names. The order of the fields in the new type match the order in which the names were specified.

      The specified fields must be valid names in the source type.

      Parameters:
      source - the record type to filter
      selected - the field names to keep in the new type, in the desired ordering for the new type
      Returns:
      the resulting filtered record type
      Throws:
      InvalidFieldException - if one or more of the specified fields do not exist in the source type
    • retain

      public static final RecordTokenType retain(RecordTokenType source, String... retained)
      Creates a new record type containing only fields from the source type which match one of the specified field names. The order of the fields is unchanged; fields remain in the same relative order as in the source.

      The specified fields do not need to be valid names in the source type.

      Parameters:
      source - the record type to filter
      retained - the field names to keep in the new type
      Returns:
      the resulting filtered record type
    • retain

      public static final RecordTokenType retain(RecordTokenType source, List<String> retained)
      Creates a new record type containing only fields from the source type which match one of the specified field names. The order of the fields is unchanged; fields remain in the same relative order as in the source.

      The specified fields do not need to be valid names in the source type.

      Parameters:
      source - the record type to filter
      retained - the field names to keep in the new type
      Returns:
      the resulting filtered record type
    • remove

      public static final RecordTokenType remove(RecordTokenType source, String... removed)
      Creates a new record type containing all fields from the source type except those matching one of the specified field names. The order of the fields is unchanged; fields remain in the same relative order as in the source.

      The specified fields do not need to be valid names in the source type.

      Parameters:
      source - the record type to filter
      removed - the field names to remove from the new type
      Returns:
      the resulting filtered record type
    • remove

      public static final RecordTokenType remove(RecordTokenType source, List<String> removed)
      Creates a new record type containing all fields from the source type except those matching one of the specified field names. The order of the fields is unchanged; fields remain in the same relative order as in the source.

      The specified fields do not need to be valid names in the source type.

      Parameters:
      source - the record type to filter
      removed - the field names to remove from the new type
      Returns:
      the resulting filtered record type
    • reorderAndRename

      public static final RecordTokenType reorderAndRename(RecordTokenType source, String[] sourceNames, String[] targetNames)
      Utility for the renaming/reordering of a given record type. Reorders the source record type to match the order specified in sourceNames. Those fields are then renamed to the corresponding ordinal counterpart in targetNames.
      Parameters:
      source - source record type
      sourceNames - list of fields in source ordered to the desired output
      targetNames - list of names the corresponding ordinal source field name will be changed to
      Returns:
      the remapped type
    • rename

      public static final RecordTokenType rename(RecordTokenType source, Map<String,String> sourceToTargetNameMap)
      Utility for renaming fields in the source type using the given source and target field names. The returned type will contain the same number of fields in the original order. Only the names provided in the given map will be changed. The source and field name lists should contain the same number of entries and are order dependent.
      Parameters:
      source - source token type
      sourceToTargetNameMap - mapping from old names to new names
      Returns:
      new type with names changed
    • rename

      public static final RecordTokenType rename(RecordTokenType source, String[] sourceNames, String[] targetNames)
      Utility for renaming fields in the source type using the given source and target field names. The returned type will contain the same number of fields in the original order. Only the names provided in the given map will be changed. The source and field name lists should contain the same number of entries and are order dependent.
      Parameters:
      source - source token type
      sourceNames - list of field names in the source
      targetNames - list of names to substitute for the source names
      Returns:
      new type with names changed
    • wrap

      public static RecordTokenType wrap(String name, ScalarTokenType type)
      Creates a new record type with a single named field of the given type.
      Parameters:
      name - the name for the field
      type - the type of the field data
      Returns:
      a new record type with a single field
    • wrap

      public static RecordTokenType wrap(String prefix, List<? extends ScalarTokenType> types)
      Creates a new record schema with the specified field types. Each field is given a name of prefix + i, where i is the field's 0-based position in the input list.
      Parameters:
      prefix - the field name prefix
      types - the types of field data
      Returns:
      a new record type with fields typed as specified
    • wrap

      public static RecordTokenType wrap(List<? extends ScalarTokenType> types)
      Creates a new record type with the specified field types. Each flow is given a field name of "field" + i, where i is the flow's 0-based position in the input list.
      Parameters:
      types - the types of field data
      Returns:
      a new record type with fields typed as specified
    • mergeRepresentations

      public static DataRepresentation mergeRepresentations(TokenType... types)
      Returns the overall representation to be used when combining types. If any of the types are sparse, the result is sparse.
      Parameters:
      types - List of types
      Returns:
      The overall representation
    • merge

      public static RecordTokenType merge(RecordTokenType... types)
      Merges the specified record types into a new one, handling name collisions by renaming. The nth instance of a field name will have "_" appended to it. If there is more than one primary key, the first primary key is chosen.

      The following conditions will hold with respect to the ordering of fields in the result:

      • All fields from a record type will be before any field from a record type later in the input list.
      • All fields from a record type will preserve their relative ordering.
      As an example, consider two record types - "A" which is ordered {"a", "c"} and "B" which is ordered {"b", "c"}. The resulting record type will be {"a", "c", "b", "c_2"}; "c" conflicts, so is renamed. The type associated with "c" is the one from type "A", with "c_2" the one from type "B".

      For a destructive merge which overwrites fields in collision, use overlay(RecordTokenType...) instead.

      Parameters:
      types - the record types to merge
      Returns:
      a new record type representing the merge of the input record types
    • overlay

      public static RecordTokenType overlay(RecordTokenType... types)
      Merges the specified record types into a new record type, handling name collisions with a last-one-wins mechanism. Last is defined as the rightmost record type in the input list containing the name in conflict.

      The following conditions will hold with respect to the ordering of fields in the result:

      • All fields from a record type will be before any field from a record type later in the input list.
      • All fields from a record type will preserve their relative ordering except those colliding with a field from an earlier record type. Those fields always occur before other fields in the record type, in an ordering consistent with the first record type containing each.
      As an example, consider two record types - "A" which is ordered {"a", "c"} and "B" which is ordered {"b", "c"}. The resulting record type will be {"a", "c", "b"}; "c" conflicts, so it appears in an order consistent with "A", the first record type containing it. Note however, the type associated with "c" will be the one from record type "B"; the type associated with "c" in "A" is lost.

      For a non-destructive merge which doesn't replace collisions, use merge(RecordTokenType...) instead.

      Parameters:
      types - the record types to merge
      Returns:
      a new record type representing the merge of the input record types
    • strictOverlay

      public static RecordTokenType strictOverlay(boolean dropUnique, RecordTokenType... types) throws InvalidFieldException
      Merges the specified record types into a new record type, allowing name collisions only if isAssignableFrom() is true for the types. Otherwise null will be returned.

      The following conditions will hold with respect to the ordering of fields in the result:

      • All fields from a record type will be before any field from a record type later in the input list.
      • All fields from a record type will preserve their relative ordering except those colliding with a field from an earlier record type. Those fields always occur before other fields in the record type, in an ordering consistent with the first record type containing each.
      As an example, consider two record types - "A" which is ordered {"a", "c"} and "B" which is ordered {"b", "c"}. The resulting record type will be {"a", "c", "b"}; "c" conflicts, so it appears in an order consistent with "A", the first record type containing it. Note however, the type associated with "c" will be the widest of the types associated with "c" in "A" and "B".

      For a non-destructive merge which doesn't replace collisions, use merge(RecordTokenType...) instead.

      Parameters:
      dropUnique - if true will not include fields not present in all records
      types - the record types to merge
      Returns:
      a new record type representing the merge of the input record types
      Throws:
      InvalidFieldException
    • mergeTypes

      public static RecordTokenType mergeTypes(TokenType... types)
      Merges the specified types into a single record type, handling name collisions by renaming. If there is more than one primary key, the first primary key is chosen.

      The result is equivalent to calling merge(RecordTokenType...), passing each record type straight through and replacing all scalar flows in the input with wrap("input"+i, type). Refer to merge(RecordTokenType...) for specific details on the merged result.

      Parameters:
      types - the types to merge
      Returns:
      a new record type representing the merge of the input types
    • matchFieldNames

      public static final RecordTokenType matchFieldNames(RecordTokenType source, RecordTokenType match)
      Utility for renaming the fields of a given record type to match that of another. A new record type is created with the fields of source but the field names of match.
      Parameters:
      source - source record type
      match - record type whose field names should be matched
      Returns:
      the resulting type
    • homogeneousRecord

      public static RecordTokenType homogeneousRecord(int size, ScalarTokenType fieldType, String fieldBase)
      Constructs a record type descriptor in which all fields have the same type. Field names are computed as fieldBase + index where index is the field's index in the record type.
      Parameters:
      size - Number of desired fields
      fieldType - Scalar type of all fields
      fieldBase - Base field name
      Returns:
      Homogeneous record type
    • fromJSON

      public static TokenType fromJSON(String json)
      Parses a JSON description of a TokenType. This method acts as an inverse to the method {toJSON(TokenType) method on token types; for any type, it will always be the case that:

      type.equals(TypeUtil.fromJSON(TypeUtil.toJSON(type))

      Parameters:
      json - the JSON format of a type.
      Returns:
      the described type
    • toJSON

      public static String toJSON(TokenType type)
      Generates the JSON description of the specified type.
      Parameters:
      type - the type for which to generate a description
      Returns:
      the type description
    • widestType

      public static ScalarTokenType widestType(ScalarTokenType... types)
      Determines the widest of the specified types. That is, which of the scalar token types T for which T.isAssignableFrom() is true for all of the types.
      Parameters:
      types - the scalar types to analyze
      Returns:
      the widest of the types. If there is no such type or no types are specified, null.
      See Also:
    • getTypes

      public static ScalarTokenType[] getTypes(RecordTokenType type)
      Returns the types of the fields of this record type.
      Parameters:
      type - the record type
      Returns:
      the types of the fields of this record type.
    • valueOf

      public static ScalarTokenType valueOf(String type)
      Gets the named scalar type.
      Parameters:
      type - the type to get
      Returns:
      the identified type
    • widestType

      public static RecordTokenType widestType(RecordTokenType... types)
      Calculates the record type for which isAssignableFrom() is true for all of the specified types. Such a type can only be found if all record types contain the same number of fields and a widest scalar type can be found for each field.
      Parameters:
      types - the record types to analyze
      Returns:
      a record type can be assigned from any of the input types. If there is no such type or no types are specified, null.
      See Also:
    • widestNamedType

      public static RecordTokenType widestNamedType(RecordTokenType... types)
      Calculates the record type for which isAssignableFrom() is true for all of the specified types matched by name. Such a type can only be found if all record types contain the same number of named fields and a widest scalar type can be found for each named field.
      Parameters:
      types - the record types to analyze
      Returns:
      a record type can be assigned from any of the input types. If there is no such type or no types are specified, null.
      See Also:
    • annotate

      public static Field annotate(Field field, String propertyName, String propertyValue)
      Returns a new field object with the same type and name as the original, but with the given property set to the given value.
      Parameters:
      field - The field to annotate
      propertyName - The name of the annotation
      propertyValue - The value of the annotation
      Returns:
      a new field with the annotation as specified
    • primaryKey

      public static Field primaryKey(Field field, boolean primaryKey)
      Returns a new field object with the same type and name as the original, but with the primaryKey flag set to the given value.
      Parameters:
      field - The original field
      primaryKey - The value for the primaryKey flag
      Returns:
      a new field with the unique flag set to the specified value
    • nonUnique

      public static Field nonUnique(Field field)
      Returns a new field object with the same type and name as the original, but with the primaryKey flag set to false
      Parameters:
      field - The original field
      Returns:
      a new field with the primaryKey flag set to false
    • nonUnique

      public static RecordTokenType nonUnique(RecordTokenType type)
      Returns a RecordTokenType, equivalent to the original, but with all primaryKey flags set to false
      Parameters:
      type - the original type
      Returns:
      a new type, with uniqueness constraints removed
    • annotate

      public static RecordTokenType annotate(RecordTokenType type, String propertyName, String propertyValue)
      Returns a new RecordTokenType with the given annotation applied to all of its fields.
      Parameters:
      type - The type to annotate
      propertyName - The name of the annotation
      propertyValue - The value of the annotation
      Returns:
      a new type with the annotation as specified
    • deriveSchema

      public static RecordTokenType deriveSchema(ScalarTyped... columns)
      Builds a default record schema using the types of the specified columnar objects. Fields will be assigned default names using the pattern "field0", "field1", ..., "fieldN".
      Parameters:
      columns - objects describing the columns. These objects implicitly provide the column data type.
      Returns:
      a schema which would be appropriate for the specified columns
    • addSourceInfoFields

      public static RecordTokenType addSourceInfoFields(RecordTokenType type)
      Constructs the output type used when tagging with source information fields. There are three fields added:
      • sourcePath, a string naming the original source file from which the record originated.
      • splitOffset, a long providing the starting byte offset of the the parse split in the file.
      • recordOffset, a long providing the starting offset for the record within the parse split.
      These fields will always be the first fields of the result type. However, they will be renamed as necessary to resolve collisions; fields in the source type will never be renamed.
      Parameters:
      type - the original source type to be extended with source information
      Returns:
      the expected source-tagged output type
    • mutating

      public static RecordTokenType mutating(RecordTokenType type, Collection<String> modifiedFields)
      Computes the resulting type assuming that the specified fields may be mutated. Fields that are not modified will have domain and custom metadata preserved. Those that are modified will have name and type only preserved.

      This methods assumes that both the underlying schema and the relative ordering of records in the flow is unchanged.

      Parameters:
      the - the original type
      modifiedFields - the fields that may be modified
      Returns:
      the type after changing field values
    • getDomainValuesAsStrings

      public static List<String> getDomainValuesAsStrings(FieldDomain domain)
      Returns the values of the domain as strings
      Parameters:
      domain - the domain
      Returns:
      the the domain as strings
    • hasDomainValues

      public static boolean hasDomainValues(RecordTokenType type, String field)
      Returns whether the given field has domain values defined.
      Parameters:
      type - the type
      field - the name of the field
      Returns:
      whether the given field has domain values defined.
      Throws:
      InvalidFieldException - if the field is not defined
    • hasDomainValues

      public static boolean hasDomainValues(RecordTokenType type)
      Returns whether all of the fields in the given type have domain values.
      Parameters:
      type - the type
      Returns:
      whether the given field has domain values defined.
      Throws:
      InvalidFieldException - if the field is not defined
    • withDomainValues

      public static RecordTokenType withDomainValues(RecordTokenType type, String fieldName, Set<String> discovered)
      Returns a new RecordTokenType with domain values set to the discovered values.
      Parameters:
      type - the original type
      fieldName - the field to update
      discovered - the discovered values
      Returns:
      a new RecordTokenType
    • mergeDomain

      public static FieldDomain mergeDomain(FieldDomain domain1, FieldDomain domain2)
      Merges two domains. The result will have:
      1. lowerBound equal to the min of the two lower bounds or unspecified lower bound if either is unspecified
      2. upperBound equal to the max of the two upper bounds or unspecified upper bound if either is unspecified
      3. values equal to the union of the two sets of values or unspecified if either is unspecified
      Parameters:
      domain1 - the first domain
      domain2 - the second domain
      Returns:
      the combined domain
      Throws:
      IllegalArgumentException - if there is no common base class between the types of the two domains