Class FileSplit

java.lang.Object
com.pervasive.datarush.io.FileSplit
All Implemented Interfaces:
DataSplit, Serializable
Direct Known Subclasses:
AzureFileSplit, CompressedFileSplit, SplittableCompressedFileSplit

public class FileSplit extends Object implements DataSplit
Describes a range of bytes from a file. Ranges are identified by a start offset and a length. DataSplit objects are used to describe how files can be divided into pieces which can then be parsed in parallel.
See Also:
  • Constructor Details

    • FileSplit

      public FileSplit(String path)
      Creates a split encompassing the entire file named by the path.
      Parameters:
      path - the file on which the split is defined
    • FileSplit

      public FileSplit(String path, long start, long length)
      Creates a split of the file named by the path.
      Parameters:
      path - the file on which the split is defined
      start - the byte offset in the named file at which the split begins
      length - the length of the split, in bytes
    • FileSplit

      public FileSplit(Path path)
      Creates a split encompassing the entire file named by the path.
      Parameters:
      path - the file on which the split is defined
    • FileSplit

      public FileSplit(Path path, long start, long length)
      Creates a split of the file named by the path.
      Parameters:
      path - the file on which the split is defined
      start - the byte offset in the named file at which the split begins
      length - the length of the split, in bytes
    • FileSplit

      protected FileSplit(Path path, long start, long length, FileClient client)
  • Method Details

    • getPath

      public Path getPath()
      Description copied from interface: DataSplit
      Gets the path to the file on which the split is defined.

      Some splits may not represent a file; in this case, null is returned.

      Specified by:
      getPath in interface DataSplit
      Returns:
      the path to the underlying source file.
    • getStartOffset

      public long getStartOffset()
      Description copied from interface: DataSplit
      Gets the byte offset of the beginning of the split.
      Specified by:
      getStartOffset in interface DataSplit
      Returns:
      the position of the first byte of the split
    • getLength

      public long getLength()
      Description copied from interface: DataSplit
      Gets the length of the split, in bytes.
      Specified by:
      getLength in interface DataSplit
      Returns:
      the size of the split
    • getEndOffset

      public long getEndOffset()
      Gets the end index of the byte range represented by this split. This offset is exclusive; that is, this represents the offset of the first byte past the end of the split.
      Returns:
      position of the end of the split
    • getFileClient

      public FileClient getFileClient()
      Return the file client associated with this split. May return null depending on the state.
      Specified by:
      getFileClient in interface DataSplit
      Returns:
      FileClient associated with this split.
    • openSource

      public InputStream openSource() throws IOException
      Description copied from interface: DataSplit
      Opens the underlying source for access. Initially, the stream is positioned at the first byte of the source. Unlike DataSplit.openSplit(int), the caller is responsible for making sure accesses are aligned to split boundaries. The stream is also unbuffered.

      This method may be required for dealing with formats which store metadata at the beginning of the file.

      Specified by:
      openSource in interface DataSplit
      Returns:
      a reader of the data in the underlying source
      Throws:
      IOException - if an I/O error occurs opening the underlying source
    • openSplit

      public SplitInputStream openSplit(int buffer) throws IOException
      Description copied from interface: DataSplit
      Opens the split for reading using the specified size for the read buffer. The reader will initially be positioned at the first byte of the split. The reader will indicate when the last byte of the split has been read via SplitInputStreamImpl.hasOverrun().
      Specified by:
      openSplit in interface DataSplit
      Parameters:
      buffer - the size of the buffer to use for reads, in bytes
      Returns:
      a reader of the data in the split
      Throws:
      IOException - if an I/O error occurs opening the underlying source
    • authorize

      public FileSplit authorize(FileClient client)
      Description copied from interface: DataSplit
      Creates an identical split which will use the specified authorization context for access.

      This method is used by clients of the IO APIs which want to provide an alternative to the OS-level authorization inherited from the JVM's execution environment. Data access methods for the split will use the supplied context.

      The authorization context is not a serializable attribute of a data split, as it represents the environment in which the data in accesses, not a property of the data itself. The context is associated with the split as a matter of convenience.

      Specified by:
      authorize in interface DataSplit
      Parameters:
      client - the authorization context to use for access
      Returns:
      a split using the provided authorization context
    • toString

      public String toString()
      Overrides:
      toString in class Object