Module datarush.library
Class DelimitedTextAnalyzer
- java.lang.Object
-
- com.pervasive.datarush.operators.io.textfile.DelimitedTextAnalyzer
-
public class DelimitedTextAnalyzer extends Object
An analyzer for files containing delimited text. An analysis can perform a basic parsing of the file, permitting validation of delimiter configuration. The following information is provided as a result of analyzing a file:- The values of fields for analyzed records.
- The record separator. If the properties specify auto-detection of newline style, the analyzer will determine whether the file uses Windows-style CRLF or UNIX-style LF.
- The field separator. If the properties specify
auto-detection of the field separator, the analyzer will attempt
to determine the appropriate separator from a known set:
comma (
','
), tab ('\t'
), semicolon (';'
), pipe ('|'
), and space (' '
). - The field delimiter. If the properties specify
auto-detection of the field separator, the analyzer will attempt
to determine the appropriate delimiter from a known set:
single quote (
'
) or double quote ("
). If one cannot be determined, the text is assumed to be undelimited. - The comment marker. If the properties specify
auto-detection of the comment marker, the analyzer will attempt
to determine the appropriate comment marker from a known set:
#
),%
, and//
. If one cannot be determined, it is assumed there is no comment marker.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DelimitedTextAnalyzer.Analysis
Contains the results of an analysis of a delimited text file.
-
Constructor Summary
Constructors Constructor Description DelimitedTextAnalyzer(FieldDelimiterSpecifier delimiters)
Creates a new analyzer which uses the given delimiter information.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DelimitedTextAnalyzer.Analysis
analyze(Path file, CharsetEncoding charsetSpec)
Analyzes the specified file based on current configuration.DelimitedTextAnalyzer.Analysis
analyze(Path file, CharsetEncoding charsetSpec, FileClient client)
Analyzes the specified file based on current configuration.DelimitedTextAnalyzer.Analysis
analyze(Reader input)
Analyzes the specified text stream based on current configuration.DelimitedTextAnalyzer.Analysis
analyze(String file, CharsetEncoding charsetSpec)
Analyzes the specified file based on current configuration.void
setAnalysisSize(int count)
Sets the maximum number of characters to use in analysis.void
setHeaderSkipCount(int count)
Sets the number of lines to skip at the beginning of the file.void
setLineComment(String lineComment)
Set the value of the indicator that a line is commented and should be ignored.
-
-
-
Constructor Detail
-
DelimitedTextAnalyzer
public DelimitedTextAnalyzer(FieldDelimiterSpecifier delimiters)
Creates a new analyzer which uses the given delimiter information. Initially, the analyzer is configured to allow unlimited record length and only parses the first row.- Parameters:
delimiters
- field structure information from which to initialize settings
-
-
Method Detail
-
setAnalysisSize
public void setAnalysisSize(int count)
Sets the maximum number of characters to use in analysis. This value should be large enough to contain at least one records. By default, 1MB is analyzed.- Parameters:
count
- the number of characters to analyze
-
setLineComment
public void setLineComment(String lineComment)
Set the value of the indicator that a line is commented and should be ignored. This line comment indicator must be found at the beginning of a line to be considered a comment.- Parameters:
lineComment
- the string value indicating a line is commented out
-
setHeaderSkipCount
public void setHeaderSkipCount(int count)
Sets the number of lines to skip at the beginning of the file. Skipped lines are only analyzed for newline discovery; they are ignored in the remainder of the analysis. By default, no lines are skipped.- Parameters:
count
- the number lines at the start of the file to skip
-
analyze
public DelimitedTextAnalyzer.Analysis analyze(String file, CharsetEncoding charsetSpec) throws IOException
Analyzes the specified file based on current configuration. The file will be processed assuming the delimiters with which the analyzer was constructed. The analysis will also indicate the delimiters used in the file. This will be the set of delimiters provided initially to the analyzer plus any discovered delimiters.- Parameters:
file
- path to the delimited text file to analyzecharsetSpec
- description of the file's character set encoding- Returns:
- an analysis of the delimited text file
- Throws:
IOException
- if an error occurs while reading the fileRowTooLongException
- if the first row exceeds the configured length
-
analyze
public DelimitedTextAnalyzer.Analysis analyze(Path file, CharsetEncoding charsetSpec) throws IOException
Analyzes the specified file based on current configuration. The file will be processed assuming the delimiters with which the analyzer was constructed. The analysis will also indicate the delimiters used in the file. This will be the set of delimiters provided initially to the analyzer plus any discovered delimiters.- Parameters:
file
- path to the delimited text file to analyzecharsetSpec
- description of the file's character set encoding- Returns:
- an analysis of the delimited text file
- Throws:
IOException
- if an error occurs while reading the fileRowTooLongException
- if the first row exceeds the configured length
-
analyze
public DelimitedTextAnalyzer.Analysis analyze(Path file, CharsetEncoding charsetSpec, FileClient client) throws IOException
Analyzes the specified file based on current configuration. The file will be processed assuming the delimiters with which the analyzer was constructed. The analysis will also indicate the delimiters used in the file. This will be the set of delimiters provided initially to the analyzer plus any discovered delimiters.- Parameters:
file
- path to the delimited text file to analyzecharsetSpec
- description of the file's character set encodingclient
- the authorization context to use for accessing the file- Returns:
- an analysis of the delimited text file
- Throws:
IOException
- if an error occurs while reading the fileRowTooLongException
- if the first row exceeds the configured length
-
analyze
public DelimitedTextAnalyzer.Analysis analyze(Reader input) throws IOException, RowTooLongException
Analyzes the specified text stream based on current configuration. The file will be processed assuming the delimiters with which the analyzer was constructed. The analysis will also indicate the delimiters used in the file. This will be the set of delimiters provided initially to the analyzer plus any discovered delimiters.- Parameters:
input
- the text data to analyze- Returns:
- an analysis of the delimited text
- Throws:
IOException
- if an error occurs while reading the fileRowTooLongException
- if the first row exceeds the configured length
-
-