-
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
AzureFileSplit
,CompressedFileSplit
,FileSplit
,SplittableCompressedFileSplit
public interface DataSplit extends Serializable
Describes a range of bytes from a data source. Ranges are identified by a start offset and a length.DataSplit
objects are used to describe how data sources can be divided into pieces which can then be parsed in parallel.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description DataSplit
authorize(FileClient client)
Creates an identical split which will use the specified authorization context for access.FileClient
getFileClient()
Return the file client associated with this split.long
getLength()
Gets the length of the split, in bytes.Path
getPath()
Gets the path to the file on which the split is defined.long
getStartOffset()
Gets the byte offset of the beginning of the split.InputStream
openSource()
Opens the underlying source for access.SplitInputStream
openSplit(int buffer)
Opens the split for reading using the specified size for the read buffer.
-
-
-
Method Detail
-
getPath
Path getPath()
Gets the path to the file on which the split is defined.Some splits may not represent a file; in this case,
null
is returned.- Returns:
- the path to the underlying source file.
-
getStartOffset
long getStartOffset()
Gets the byte offset of the beginning of the split.- Returns:
- the position of the first byte of the split
-
getLength
long getLength()
Gets the length of the split, in bytes.- Returns:
- the size of the split
-
openSplit
SplitInputStream openSplit(int buffer) throws IOException
Opens the split for reading using the specified size for the read buffer. The reader will initially be positioned at the first byte of the split. The reader will indicate when the last byte of the split has been read viaSplitInputStreamImpl.hasOverrun()
.- Parameters:
buffer
- the size of the buffer to use for reads, in bytes- Returns:
- a reader of the data in the split
- Throws:
IOException
- if an I/O error occurs opening the underlying source
-
openSource
InputStream openSource() throws IOException
Opens the underlying source for access. Initially, the stream is positioned at the first byte of the source. UnlikeopenSplit(int)
, the caller is responsible for making sure accesses are aligned to split boundaries. The stream is also unbuffered.This method may be required for dealing with formats which store metadata at the beginning of the file.
- Returns:
- a reader of the data in the underlying source
- Throws:
IOException
- if an I/O error occurs opening the underlying source
-
authorize
DataSplit authorize(FileClient client)
Creates an identical split which will use the specified authorization context for access.This method is used by clients of the IO APIs which want to provide an alternative to the OS-level authorization inherited from the JVM's execution environment. Data access methods for the split will use the supplied context.
The authorization context is not a serializable attribute of a data split, as it represents the environment in which the data in accesses, not a property of the data itself. The context is associated with the split as a matter of convenience.
- Parameters:
client
- the authorization context to use for access- Returns:
- a split using the provided authorization context
-
getFileClient
FileClient getFileClient()
Return the file client associated with this split. The file client can be used to provide filesystem specific context.May return null depending on the state.
- Returns:
FileClient
associated with this split.
-
-