- All Superinterfaces:
InputStreamSupplier
- All Known Implementing Classes:
BasicByteSource,ConcatenatedByteSource,GlobbingByteSource
An abstract source of bytes.
ByteSource objects
represent entities existing outside of a logical graph,
such as files and sockets, which can be read as a stream
of bytes. These can then be used in conjunction with
DataFormat objects to produce records which then
flow through the dataflow graph, the most common reason
being loading persisted data from disk.
Generally, it is not necessarily to implement or even
directly use ByteSource objects. Most read operators
provide a more convenient interface which obscures
the object; see AbstractReader as an example.
By default, sources use OS-level authorization inherited from the execution environment, but can be configured to use use more complex authentication mechanisms to provide an authorization context.
-
Method Summary
Modifier and TypeMethodDescriptionauthorize(FileClient client) Creates a new source with the same properties, but using the specified authorization.generateSplits(SplitOptions options) Gets an iterator producing a set ofDataSplitobjects covering the source.open()Opens the source for reading.validate()Performs validation of the source configuration.
-
Method Details
-
authorize
Creates a new source with the same properties, but using the specified authorization.If a source is supposed to be used with a specific authorization context, this method should be called to produce a new source to use.
- Parameters:
client- the authorization context to use for access- Returns:
- a source using the provided authorization context
-
validate
Performs validation of the source configuration. This checks things such as the existence and accessibility of the source. It may also optionally rewrite the source to an equivalent one, doing file glob and directory expansion.- Returns:
- a valid source equivalent to this one
- Throws:
IOException- if an I/O error occurs while validating the source
-
open
Opens the source for reading. The caller is responsible for closing the returnedInputStream.- Specified by:
openin interfaceInputStreamSupplier- Returns:
- a reader of the bytes from the source
- Throws:
IOException- if an I/O error occurs while opening the source
-
generateSplits
Gets an iterator producing a set ofDataSplitobjects covering the source. The source is split as requested in the specified options, within the source's ability to meet the requirements.- Parameters:
options- configurable options to use in generating the splits- Returns:
- an iterator over valid splits of the source
- Throws:
IOException- if an I/O error occurs while generating splits
-