- java.lang.Object
-
- java.io.Reader
-
- com.pervasive.datarush.io.SplitReader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Readable
public class SplitReader extends Reader
A character based reader for splits. ASplitReader
provides a text view of split data, handling conversion from bytes to characters.Because splits may fall at arbitrary points within a file, consumers may need to perform additional processing to place themselves in a valid position.
SplitReader
supports this case by allowing a character sequence to be provided for use as a synchronization marker at the beginning and end of splits.Normally, reading begins at the beginning of the split, but can be configured to start at the first character after the synchronization marker. Care should be taken when reading at the split beginning, as split boundaries may occur in the middle of an encoded character sequence.
Similarly, reads can continue beyond the end of the split. Readers can either manage this themselves or auto-terminate after the first complete synchronization marker beyond the end of split.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BUFFER
Default buffer size for input to the character decoder, in bytes
-
Constructor Summary
Constructors Constructor Description SplitReader(SplitInputStream in, CharsetEncoding charset)
Creates a new reader on the specified stream, using the given encoding properties.SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer)
Creates a new reader on the specified stream, using the given encoding properties.SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer, String syncMarker, boolean doInitialSync)
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker.SplitReader(SplitInputStream in, CharsetEncoding charset, String syncMarker, boolean doInitialSync)
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description long
charsRead()
Gets the character offset into the underlying split.void
close()
boolean
hasOverrun()
Indicates whether the reader has passed the end of the underlying split.void
mark(int readAheadLimit)
boolean
markSupported()
int
read()
int
read(char[] cbuf)
int
read(char[] cbuf, int off, int len)
int
read(CharBuffer target)
boolean
readIfPresent(char[] chars)
Conditionally reads the input to see if the specified characters are present.boolean
readLine(Appendable lineBuffer)
Reads a line of text into the specified buffer.boolean
ready()
void
reset()
long
skip(long n)
boolean
skipTo(char[] bytes)
Advances the position of the stream to the first character after the specified pattern.-
Methods inherited from class java.io.Reader
nullReader, transferTo
-
-
-
-
Field Detail
-
DEFAULT_BUFFER
public static final int DEFAULT_BUFFER
Default buffer size for input to the character decoder, in bytes- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset)
Creates a new reader on the specified stream, using the given encoding properties. The default decoding buffer size is used.- Parameters:
in
- the input stream on the split being readcharset
- character set encoding properties
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer)
Creates a new reader on the specified stream, using the given encoding properties. The decoding buffer is the requested size.- Parameters:
in
- the input stream on the split being readcharset
- character set encoding propertiesbuffer
- the size of the decoding input buffer, in bytes
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, String syncMarker, boolean doInitialSync) throws IOException
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The default decoding buffer size is used.The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.
- Parameters:
in
- the input stream on the split being readcharset
- character set encoding propertiessyncMarker
- the character sequence to use to synchronize the read positionsdoInitialSync
- indicates whether to synchronize the read position before the first read- Throws:
IOException
- if an I/O error occurs while performing initial position synchronization
-
SplitReader
public SplitReader(SplitInputStream in, CharsetEncoding charset, int buffer, String syncMarker, boolean doInitialSync) throws IOException
Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The decoding buffer is the requested size.The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.
- Parameters:
in
- the input stream on the split being readcharset
- character set encoding propertiesbuffer
- the size of the decoding input buffer, in bytessyncMarker
- the character sequence to use to synchronize the read positionsdoInitialSync
- indicates whether to synchronize the read position before the first read- Throws:
IOException
- if an I/O error occurs while performing initial position synchronization
-
-
Method Detail
-
read
public int read() throws IOException
- Overrides:
read
in classReader
- Throws:
IOException
-
read
public int read(char[] cbuf) throws IOException
- Overrides:
read
in classReader
- Throws:
IOException
-
read
public int read(char[] cbuf, int off, int len) throws IOException
- Specified by:
read
in classReader
- Throws:
IOException
-
read
public int read(CharBuffer target) throws IOException
- Specified by:
read
in interfaceReadable
- Overrides:
read
in classReader
- Throws:
IOException
-
skip
public long skip(long n) throws IOException
- Overrides:
skip
in classReader
- Throws:
IOException
-
markSupported
public boolean markSupported()
- Overrides:
markSupported
in classReader
-
mark
public void mark(int readAheadLimit) throws IOException
- Overrides:
mark
in classReader
- Throws:
IOException
-
reset
public void reset() throws IOException
- Overrides:
reset
in classReader
- Throws:
IOException
-
close
public void close()
-
hasOverrun
public boolean hasOverrun()
Indicates whether the reader has passed the end of the underlying split.Because character encodings may be multiple bytes, the split may fall in the middle of a character. Overrun is flagged with the first character whose encoding has a byte beyond the end of the split.
- Returns:
true
if the current read position is beyond the end of the split,false
otherwise
-
readIfPresent
public boolean readIfPresent(char[] chars) throws IOException
Conditionally reads the input to see if the specified characters are present. If they are, the read position is advanced to the first character following the sequence. Otherwise, the read position is unchanged.- Parameters:
chars
- the character sequence which is to be checked- Returns:
true
if the sequence was found,false
otherwise- Throws:
IOException
- if an I/O error occurs during the read
-
charsRead
public long charsRead()
Gets the character offset into the underlying split. This is measured from the first full character within the split, in case the split begins in the middle of a character encoding.- Returns:
- the number of characters read so far in the split
-
skipTo
public boolean skipTo(char[] bytes) throws IOException
Advances the position of the stream to the first character after the specified pattern.- Parameters:
bytes
- the pattern to find in the stream- Returns:
false
if end of data is reached,true
otherwise.- Throws:
IOException
- if an I/O error occurs while reading the stream
-
readLine
public boolean readLine(Appendable lineBuffer) throws IOException
Reads a line of text into the specified buffer. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.- Parameters:
lineBuffer
- the buffer to which to append line data.- Returns:
false
if end of file has been reached during the read- Throws:
IOException
- See Also:
BufferedReader#readLine()
-
-