Interface ByteSource

All Superinterfaces:
InputStreamSupplier
All Known Implementing Classes:
BasicByteSource, ConcatenatedByteSource, GlobbingByteSource

public interface ByteSource extends InputStreamSupplier
An abstract source of bytes. ByteSource objects represent entities existing outside of a logical graph, such as files and sockets, which can be read as a stream of bytes. These can then be used in conjunction with DataFormat objects to produce records which then flow through the dataflow graph, the most common reason being loading persisted data from disk.

Generally, it is not necessarily to implement or even directly use ByteSource objects. Most read operators provide a more convenient interface which obscures the object; see AbstractReader as an example.

By default, sources use OS-level authorization inherited from the execution environment, but can be configured to use use more complex authentication mechanisms to provide an authorization context.

  • Method Details

    • authorize

      ByteSource authorize(FileClient client)
      Creates a new source with the same properties, but using the specified authorization.

      If a source is supposed to be used with a specific authorization context, this method should be called to produce a new source to use.

      Parameters:
      client - the authorization context to use for access
      Returns:
      a source using the provided authorization context
    • validate

      ByteSource validate() throws IOException
      Performs validation of the source configuration. This checks things such as the existence and accessibility of the source. It may also optionally rewrite the source to an equivalent one, doing file glob and directory expansion.
      Returns:
      a valid source equivalent to this one
      Throws:
      IOException - if an I/O error occurs while validating the source
    • open

      InputStream open() throws IOException
      Opens the source for reading. The caller is responsible for closing the returned InputStream.
      Specified by:
      open in interface InputStreamSupplier
      Returns:
      a reader of the bytes from the source
      Throws:
      IOException - if an I/O error occurs while opening the source
    • generateSplits

      SplitIterator generateSplits(SplitOptions options) throws IOException
      Gets an iterator producing a set of DataSplit objects covering the source. The source is split as requested in the specified options, within the source's ability to meet the requirements.
      Parameters:
      options - configurable options to use in generating the splits
      Returns:
      an iterator over valid splits of the source
      Throws:
      IOException - if an I/O error occurs while generating splits