Class CompressedFileSplit

  • All Implemented Interfaces:
    DataSplit, Serializable

    public class CompressedFileSplit
    extends FileSplit
    Describes a range of bytes from a compressed file. As most, if not all compression schemes cannot support starting decompression at an arbitrary point, only entire file splits are supported.
    See Also:
    Serialized Form
    • Constructor Detail

      • CompressedFileSplit

        public CompressedFileSplit​(Path path,
                                   CompressionFormat format)
        Creates a split encompassing the entire file named by the path.
        Parameters:
        path - the file on which the split is defined
        format - the compression format of the file
    • Method Detail

      • openSource

        public InputStream openSource()
                               throws IOException
        Description copied from interface: DataSplit
        Opens the underlying source for access. Initially, the stream is positioned at the first byte of the source. Unlike DataSplit.openSplit(int), the caller is responsible for making sure accesses are aligned to split boundaries. The stream is also unbuffered.

        This method may be required for dealing with formats which store metadata at the beginning of the file.

        Specified by:
        openSource in interface DataSplit
        Overrides:
        openSource in class FileSplit
        Returns:
        a reader of the data in the underlying source
        Throws:
        IOException - if an I/O error occurs opening the underlying source
      • authorize

        public FileSplit authorize​(FileClient client)
        Description copied from interface: DataSplit
        Creates an identical split which will use the specified authorization context for access.

        This method is used by clients of the IO APIs which want to provide an alternative to the OS-level authorization inherited from the JVM's execution environment. Data access methods for the split will use the supplied context.

        The authorization context is not a serializable attribute of a data split, as it represents the environment in which the data in accesses, not a property of the data itself. The context is associated with the split as a matter of convenience.

        Specified by:
        authorize in interface DataSplit
        Overrides:
        authorize in class FileSplit
        Parameters:
        client - the authorization context to use for access
        Returns:
        a split using the provided authorization context