Interface ByteSource

  • All Superinterfaces:
    InputStreamSupplier
    All Known Implementing Classes:
    BasicByteSource, ConcatenatedByteSource, GlobbingByteSource

    public interface ByteSource
    extends InputStreamSupplier
    An abstract source of bytes. ByteSource objects represent entities existing outside of a logical graph, such as files and sockets, which can be read as a stream of bytes. These can then be used in conjunction with DataFormat objects to produce records which then flow through the dataflow graph, the most common reason being loading persisted data from disk.

    Generally, it is not necessarily to implement or even directly use ByteSource objects. Most read operators provide a more convenient interface which obscures the object; see AbstractReader as an example.

    By default, sources use OS-level authorization inherited from the execution environment, but can be configured to use use more complex authentication mechanisms to provide an authorization context.

    • Method Detail

      • authorize

        ByteSource authorize​(FileClient client)
        Creates a new source with the same properties, but using the specified authorization.

        If a source is supposed to be used with a specific authorization context, this method should be called to produce a new source to use.

        Parameters:
        client - the authorization context to use for access
        Returns:
        a source using the provided authorization context
      • validate

        ByteSource validate()
                     throws IOException
        Performs validation of the source configuration. This checks things such as the existence and accessibility of the source. It may also optionally rewrite the source to an equivalent one, doing file glob and directory expansion.
        Returns:
        a valid source equivalent to this one
        Throws:
        IOException - if an I/O error occurs while validating the source
      • open

        InputStream open()
                  throws IOException
        Opens the source for reading. The caller is responsible for closing the returned InputStream.
        Specified by:
        open in interface InputStreamSupplier
        Returns:
        a reader of the bytes from the source
        Throws:
        IOException - if an I/O error occurs while opening the source
      • generateSplits

        SplitIterator generateSplits​(SplitOptions options)
                              throws IOException
        Gets an iterator producing a set of DataSplit objects covering the source. The source is split as requested in the specified options, within the source's ability to meet the requirements.
        Parameters:
        options - configurable options to use in generating the splits
        Returns:
        an iterator over valid splits of the source
        Throws:
        IOException - if an I/O error occurs while generating splits