Class SplitReader

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Readable

    public class SplitReader
    extends Reader
    A character based reader for splits. A SplitReader provides a text view of split data, handling conversion from bytes to characters.

    Because splits may fall at arbitrary points within a file, consumers may need to perform additional processing to place themselves in a valid position. SplitReader supports this case by allowing a character sequence to be provided for use as a synchronization marker at the beginning and end of splits.

    Normally, reading begins at the beginning of the split, but can be configured to start at the first character after the synchronization marker. Care should be taken when reading at the split beginning, as split boundaries may occur in the middle of an encoded character sequence.

    Similarly, reads can continue beyond the end of the split. Readers can either manage this themselves or auto-terminate after the first complete synchronization marker beyond the end of split.

    • Field Detail

      • DEFAULT_BUFFER

        public static final int DEFAULT_BUFFER
        Default buffer size for input to the character decoder, in bytes
        See Also:
        Constant Field Values
    • Constructor Detail

      • SplitReader

        public SplitReader​(SplitInputStream in,
                           CharsetEncoding charset)
        Creates a new reader on the specified stream, using the given encoding properties. The default decoding buffer size is used.
        Parameters:
        in - the input stream on the split being read
        charset - character set encoding properties
      • SplitReader

        public SplitReader​(SplitInputStream in,
                           CharsetEncoding charset,
                           int buffer)
        Creates a new reader on the specified stream, using the given encoding properties. The decoding buffer is the requested size.
        Parameters:
        in - the input stream on the split being read
        charset - character set encoding properties
        buffer - the size of the decoding input buffer, in bytes
      • SplitReader

        public SplitReader​(SplitInputStream in,
                           CharsetEncoding charset,
                           String syncMarker,
                           boolean doInitialSync)
                    throws IOException
        Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The default decoding buffer size is used.

        The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.

        Parameters:
        in - the input stream on the split being read
        charset - character set encoding properties
        syncMarker - the character sequence to use to synchronize the read positions
        doInitialSync - indicates whether to synchronize the read position before the first read
        Throws:
        IOException - if an I/O error occurs while performing initial position synchronization
      • SplitReader

        public SplitReader​(SplitInputStream in,
                           CharsetEncoding charset,
                           int buffer,
                           String syncMarker,
                           boolean doInitialSync)
                    throws IOException
        Creates a new reader on the specified stream, using the given encoding properties and synchronization marker. The decoding buffer is the requested size.

        The reader will stop providing data after the first complete synchronization marker appearing after the end of the split.

        Parameters:
        in - the input stream on the split being read
        charset - character set encoding properties
        buffer - the size of the decoding input buffer, in bytes
        syncMarker - the character sequence to use to synchronize the read positions
        doInitialSync - indicates whether to synchronize the read position before the first read
        Throws:
        IOException - if an I/O error occurs while performing initial position synchronization
    • Method Detail

      • ready

        public boolean ready()
        Overrides:
        ready in class Reader
      • hasOverrun

        public boolean hasOverrun()
        Indicates whether the reader has passed the end of the underlying split.

        Because character encodings may be multiple bytes, the split may fall in the middle of a character. Overrun is flagged with the first character whose encoding has a byte beyond the end of the split.

        Returns:
        true if the current read position is beyond the end of the split, false otherwise
      • readIfPresent

        public boolean readIfPresent​(char[] chars)
                              throws IOException
        Conditionally reads the input to see if the specified characters are present. If they are, the read position is advanced to the first character following the sequence. Otherwise, the read position is unchanged.
        Parameters:
        chars - the character sequence which is to be checked
        Returns:
        true if the sequence was found, false otherwise
        Throws:
        IOException - if an I/O error occurs during the read
      • charsRead

        public long charsRead()
        Gets the character offset into the underlying split. This is measured from the first full character within the split, in case the split begins in the middle of a character encoding.
        Returns:
        the number of characters read so far in the split
      • skipTo

        public boolean skipTo​(char[] bytes)
                       throws IOException
        Advances the position of the stream to the first character after the specified pattern.
        Parameters:
        bytes - the pattern to find in the stream
        Returns:
        false if end of data is reached, true otherwise.
        Throws:
        IOException - if an I/O error occurs while reading the stream
      • readLine

        public boolean readLine​(Appendable lineBuffer)
                         throws IOException
        Reads a line of text into the specified buffer. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
        Parameters:
        lineBuffer - the buffer to which to append line data.
        Returns:
        false if end of file has been reached during the read
        Throws:
        IOException
        See Also:
        BufferedReader#readLine()