- java.lang.Object
-
- java.io.Reader
-
- com.pervasive.datarush.io.SplitReader
-
- All Implemented Interfaces:
Closeable,AutoCloseable,Readable
public class SplitReader extends Reader
A character based reader for splits. ASplitReaderprovides a text view of split data, handling conversion from bytes to characters.Because splits may fall at arbitrary points within a file, consumers may need to perform additional processing to place themselves in a valid position.
SplitReadersupports this case by allowing a character sequence to be provided for use as a synchronization marker at the beginning and end of splits.Normally, reading begins at the beginning of the split, but can be configured to start at the first character after the synchronization marker. Care should be taken when reading at the split beginning, as split boundaries may occur in the middle of an encoded character sequence.
Similarly, reads can continue beyond the end of the split. Readers can either manage this themselves or auto-terminate after the first complete synchronization marker beyond the end of split.
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_BUFFERDefault buffer size for input to the character decoder, in bytes
-
Constructor Summary
Constructors Constructor Description SplitReader(SplitInputStream in, CharsetEncoding charset)Creates a new reader on the specified stream, using the given encoding properties.SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer)Creates a new reader on the specified stream, using the given encoding properties.SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer, String syncMarker, boolean doInitialSync)Creates a new reader on the specified stream, using the given encoding properties and synchronization marker.SplitReader(SplitInputStream in, CharsetEncoding charset, String syncMarker, boolean doInitialSync)Creates a new reader on the specified stream, using the given encoding properties and synchronization marker.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description longcharsRead()Gets the character offset into the underlying split.voidclose()booleanhasOverrun()Indicates whether the reader has passed the end of the underlying split.voidmark(int readAheadLimit)booleanmarkSupported()intread()intread(char[] cbuf)intread(char[] cbuf, int off, int len)intread(CharBuffer target)booleanreadIfPresent(char[] chars)Conditionally reads the input to see if the specified characters are present.booleanreadLine(Appendable lineBuffer)Reads a line of text into the specified buffer.booleanready()voidreset()longskip(long n)booleanskipTo(char[] bytes)Advances the position of the stream to the first character after the specified pattern.-
Methods inherited from class java.io.Reader
nullReader, transferTo
-
-
-
-
Field Detail
-
DEFAULT_BUFFER
public static final int DEFAULT_BUFFER
Default buffer size for input to the character decoder, in bytes- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset)
Creates a new reader on the specified stream, using the given encoding properties. The default decoding buffer size is used.- Parameters:
in- the input stream on the split being readcharset- character set encoding properties
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer)
Creates a new reader on the specified stream, using the given encoding properties. The decoding buffer is the requested size.- Parameters:
in- the input stream on the split being readcharset- character set encoding propertiesbuffer- the size of the decoding input buffer, in bytes
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, String syncMarker, boolean doInitialSync) throws IOException
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The default decoding buffer size is used.The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.
- Parameters:
in- the input stream on the split being readcharset- character set encoding propertiessyncMarker- the character sequence to use to synchronize the read positionsdoInitialSync- indicates whether to synchronize the read position before the first read- Throws:
IOException- if an I/O error occurs while performing initial position synchronization
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer, String syncMarker, boolean doInitialSync) throws IOException
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The decoding buffer is the requested size.The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.
- Parameters:
in- the input stream on the split being readcharset- character set encoding propertiesbuffer- the size of the decoding input buffer, in bytessyncMarker- the character sequence to use to synchronize the read positionsdoInitialSync- indicates whether to synchronize the read position before the first read- Throws:
IOException- if an I/O error occurs while performing initial position synchronization
-
-
Method Detail
-
read
public int read() throws IOException- Overrides:
readin classReader- Throws:
IOException
-
read
public int read(char[] cbuf) throws IOException- Overrides:
readin classReader- Throws:
IOException
-
read
public int read(char[] cbuf, int off, int len) throws IOException- Specified by:
readin classReader- Throws:
IOException
-
read
public int read(CharBuffer target) throws IOException
- Specified by:
readin interfaceReadable- Overrides:
readin classReader- Throws:
IOException
-
skip
public long skip(long n) throws IOException- Overrides:
skipin classReader- Throws:
IOException
-
markSupported
public boolean markSupported()
- Overrides:
markSupportedin classReader
-
mark
public void mark(int readAheadLimit) throws IOException- Overrides:
markin classReader- Throws:
IOException
-
reset
public void reset() throws IOException- Overrides:
resetin classReader- Throws:
IOException
-
close
public void close()
-
hasOverrun
public boolean hasOverrun()
Indicates whether the reader has passed the end of the underlying split.Because character encodings may be multiple bytes, the split may fall in the middle of a character. Overrun is flagged with the first character whose encoding has a byte beyond the end of the split.
- Returns:
trueif the current read position is beyond the end of the split,falseotherwise
-
readIfPresent
public boolean readIfPresent(char[] chars) throws IOExceptionConditionally reads the input to see if the specified characters are present. If they are, the read position is advanced to the first character following the sequence. Otherwise, the read position is unchanged.- Parameters:
chars- the character sequence which is to be checked- Returns:
trueif the sequence was found,falseotherwise- Throws:
IOException- if an I/O error occurs during the read
-
charsRead
public long charsRead()
Gets the character offset into the underlying split. This is measured from the first full character within the split, in case the split begins in the middle of a character encoding.- Returns:
- the number of characters read so far in the split
-
skipTo
public boolean skipTo(char[] bytes) throws IOExceptionAdvances the position of the stream to the first character after the specified pattern.- Parameters:
bytes- the pattern to find in the stream- Returns:
falseif end of data is reached,trueotherwise.- Throws:
IOException- if an I/O error occurs while reading the stream
-
readLine
public boolean readLine(Appendable lineBuffer) throws IOException
Reads a line of text into the specified buffer. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.- Parameters:
lineBuffer- the buffer to which to append line data.- Returns:
falseif end of file has been reached during the read- Throws:
IOException- See Also:
BufferedReader#readLine()
-
-