Interface SplitInputStream

  • All Known Implementing Classes:
    SplitInputStreamImpl

    public interface SplitInputStream
    Interface defining an input data stream that works within the boundaries of a defined split. A split has a beginning and ending offset that defines the boundary of the split.

    The contract for spanning the end of the split follows:

    • A read that encounters a split boundary will return only the data within the split. Data from the current split and the next split are not mixed within the same buffer.
    • Before the read operation that encounters the split boundary returns, an indicator that the boundary has been crossed should be set. This indicator is returned by the hasOverrun() method.
    • The availableInSplit() method should return the number of bytes left in the split (or an estimate) or zero if the split boundary has been crossed.
    • Reading past the split boundary is allowed. This is needed for formats that may need to search for the end of a record that spans splits.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      long availableInSplit()
      Gets the number of bytes remaining in the split.
      void close()
      Close the underlying raw data stream.
      boolean hasOverrun()
      Indicates whether the reader has read past the end of the split.
      int read​(byte[] buffer, int offset, int length)
      Read data from the split input stream.
    • Method Detail

      • availableInSplit

        long availableInSplit()
        Gets the number of bytes remaining in the split.
        Returns:
        the number of bytes left to read in the split. If the split has been completely read, this is 0.
      • hasOverrun

        boolean hasOverrun()
        Indicates whether the reader has read past the end of the split.
        Returns:
        true if all bytes in the split has been read, false if any remain.
      • read

        int read​(byte[] buffer,
                 int offset,
                 int length)
          throws IOException
        Read data from the split input stream. By convention, if reading before the end of a split, the read will only return data contained within the split. A subsequent read will read data beyond the split.
        Parameters:
        buffer - input buffer
        offset - offset to start filling in buffer
        length - amount of data to read
        Returns:
        number of bytes read or -1 if EOD has been reached
        Throws:
        IOException - thrown if an I/O error occurs
      • close

        void close()
        Close the underlying raw data stream.