java.lang.Object
com.pervasive.datarush.schema.PatternBasedDiscovery
- All Implemented Interfaces:
TextRecordDiscoverer
- Direct Known Subclasses:
JsonPatternBasedDiscovery
Discovers a schema for delimited text by checking values against a
mapping of regular expressions to data types.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic List<TypePattern>The default patterns used in discovery when none are specified, in order:TypePattern.INT_PATTERNTypePattern.LONG_PATTERNTypePattern.DOUBLE_PATTERNTypePattern.MONEY_PATTERNTypePattern.DATE_PATTERNTypePattern.TIME_PATTERNTypePattern.TIMESTAMP_PATTERNTypePattern.BOOLEAN_PATTERNTypePattern.IP4ADDRESS_PATTERNTypePattern.IP6ADDRESS_PATTERNTypePattern.DURATION_PATTERNTypePattern.PERIOD_PATTERN -
Constructor Summary
ConstructorsConstructorDescriptionCreates a default mapping.PatternBasedDiscovery(List<TypePattern> patterns) Creates a mapping using the specified patterns. -
Method Summary
Modifier and TypeMethodDescriptiondiscoverForRead(List<List<String>> rows, boolean containsHeader, TextConversionDefaults defaults) Constructs a schema for reading based on the given file analysis.generateForWrite(RecordTokenType type, TextConversionDefaults defaults) Constructs a schema for writing based on the given record type.protected TextDataTypeprotected TextDataTypepredictType(String value)
-
Field Details
-
DEFAULT_PATTERNS
The default patterns used in discovery when none are specified, in order:TypePattern.INT_PATTERNTypePattern.LONG_PATTERNTypePattern.DOUBLE_PATTERNTypePattern.MONEY_PATTERNTypePattern.DATE_PATTERNTypePattern.TIME_PATTERNTypePattern.TIMESTAMP_PATTERNTypePattern.BOOLEAN_PATTERNTypePattern.IP4ADDRESS_PATTERNTypePattern.IP6ADDRESS_PATTERNTypePattern.DURATION_PATTERNTypePattern.PERIOD_PATTERN
-
-
Constructor Details
-
PatternBasedDiscovery
public PatternBasedDiscovery()Creates a default mapping. The default mapping detects:- See Also:
-
PatternBasedDiscovery
Creates a mapping using the specified patterns. Patterns are tested in the order listed with the first matching pattern used to determine type. Conflicts between records on a field's type are resolved toTextTypes.STRINGin most cases. If a field value matches the default null indicator, it is ignored for typing purposes.- Parameters:
patterns- the patterns against which to compare values. The list is in precedence order; the first pattern to match determines the type.
-
-
Method Details
-
predictType
-
mergeTypes
-
discoverForRead
public TextRecord discoverForRead(List<List<String>> rows, boolean containsHeader, TextConversionDefaults defaults) Description copied from interface:TextRecordDiscovererConstructs a schema for reading based on the given file analysis. The analysis may or may not include a header row, as indicated.- Specified by:
discoverForReadin interfaceTextRecordDiscoverer- Parameters:
rows- the analyzed rows from a text filecontainsHeader- indicates whether the first row of the analyzed file is a headerdefaults- defaults for the discovered schema- Returns:
- a schema appropriate for reading the file.
-
generateForWrite
Description copied from interface:TextRecordDiscovererConstructs a schema for writing based on the given record type.- Specified by:
generateForWritein interfaceTextRecordDiscoverer- Parameters:
type- the record type of the datadefaults- defaults for the discovered schema- Returns:
- a schema appropriate for writing data of the given type
-