- java.lang.Object
-
- com.pervasive.datarush.schema.PatternBasedDiscovery
-
- All Implemented Interfaces:
TextRecordDiscoverer
- Direct Known Subclasses:
JsonPatternBasedDiscovery
public class PatternBasedDiscovery extends Object implements TextRecordDiscoverer
Discovers a schema for delimited text by checking values against a mapping of regular expressions to data types.
-
-
Field Summary
Fields Modifier and Type Field Description static List<TypePattern>
DEFAULT_PATTERNS
The default patterns used in discovery when none are specified, in order:TypePattern.INT_PATTERN
TypePattern.LONG_PATTERN
TypePattern.DOUBLE_PATTERN
TypePattern.MONEY_PATTERN
TypePattern.DATE_PATTERN
TypePattern.TIME_PATTERN
TypePattern.TIMESTAMP_PATTERN
TypePattern.BOOLEAN_PATTERN
TypePattern.IP4ADDRESS_PATTERN
TypePattern.IP6ADDRESS_PATTERN
TypePattern.DURATION_PATTERN
TypePattern.PERIOD_PATTERN
-
Constructor Summary
Constructors Constructor Description PatternBasedDiscovery()
Creates a default mapping.PatternBasedDiscovery(List<TypePattern> patterns)
Creates a mapping using the specified patterns.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TextRecord
discoverForRead(List<List<String>> rows, boolean containsHeader, TextConversionDefaults defaults)
Constructs a schema for reading based on the given file analysis.TextRecord
generateForWrite(RecordTokenType type, TextConversionDefaults defaults)
Constructs a schema for writing based on the given record type.protected TextDataType
mergeTypes(TextDataType l, TextDataType r)
protected TextDataType
predictType(String value)
-
-
-
Field Detail
-
DEFAULT_PATTERNS
public static List<TypePattern> DEFAULT_PATTERNS
The default patterns used in discovery when none are specified, in order:TypePattern.INT_PATTERN
TypePattern.LONG_PATTERN
TypePattern.DOUBLE_PATTERN
TypePattern.MONEY_PATTERN
TypePattern.DATE_PATTERN
TypePattern.TIME_PATTERN
TypePattern.TIMESTAMP_PATTERN
TypePattern.BOOLEAN_PATTERN
TypePattern.IP4ADDRESS_PATTERN
TypePattern.IP6ADDRESS_PATTERN
TypePattern.DURATION_PATTERN
TypePattern.PERIOD_PATTERN
-
-
Constructor Detail
-
PatternBasedDiscovery
public PatternBasedDiscovery()
Creates a default mapping. The default mapping detects:- See Also:
DEFAULT_PATTERNS
-
PatternBasedDiscovery
public PatternBasedDiscovery(List<TypePattern> patterns)
Creates a mapping using the specified patterns. Patterns are tested in the order listed with the first matching pattern used to determine type. Conflicts between records on a field's type are resolved toTextTypes.STRING
in most cases. If a field value matches the default null indicator, it is ignored for typing purposes.- Parameters:
patterns
- the patterns against which to compare values. The list is in precedence order; the first pattern to match determines the type.
-
-
Method Detail
-
predictType
protected TextDataType predictType(String value)
-
mergeTypes
protected TextDataType mergeTypes(TextDataType l, TextDataType r)
-
discoverForRead
public TextRecord discoverForRead(List<List<String>> rows, boolean containsHeader, TextConversionDefaults defaults)
Description copied from interface:TextRecordDiscoverer
Constructs a schema for reading based on the given file analysis. The analysis may or may not include a header row, as indicated.- Specified by:
discoverForRead
in interfaceTextRecordDiscoverer
- Parameters:
rows
- the analyzed rows from a text filecontainsHeader
- indicates whether the first row of the analyzed file is a headerdefaults
- defaults for the discovered schema- Returns:
- a schema appropriate for reading the file.
-
generateForWrite
public TextRecord generateForWrite(RecordTokenType type, TextConversionDefaults defaults)
Description copied from interface:TextRecordDiscoverer
Constructs a schema for writing based on the given record type.- Specified by:
generateForWrite
in interfaceTextRecordDiscoverer
- Parameters:
type
- the record type of the datadefaults
- defaults for the discovered schema- Returns:
- a schema appropriate for writing data of the given type
-
-