Module datarush.commons
Package com.pervasive.datarush.io
Provides classes and interfaces performing file-like I/O operations.
DataRush uses model for file access based on the
concept of paths, a generic "file system" location.
Paths are represented in a URI-like fashion, having a scheme
and a scheme-specific component. This abstraction provides
a consistent mechanism for interacting with data which can
be extended so that existing operators can automatically
support many different sources of data.
There are three main components to this model:
- Paths, which as mentioned above, refer to locations where data resides. Paths are merely syntactic entities; path objects to not expose access to the reference data directly. Paths may be syntactically valid without pointing to an existing location.
- File systems which represent a logical storage location. Every path has an associated file system; a file system may have many paths associated with it.
- File system providers provide the means of accessing data located on a file system via a path. Every file system has an associated provider; a provider may have multiple file systems associated with it.
Of these items, users only need to be aware of paths and the utility classes surrounding them. File systems and file system providers are implementation-specific and only required when developing support for a new type of path.
Additionally, the DataRush model has the concept of a file split, similar to that found in Hadoop. Splits are used to parallelize processing on files. File system providers should provide support for splitting files when possible to get parallelism when reading files.
-
Interface Summary Interface Description DataSplit Describes a range of bytes from a data source.DirectoryFilter A filter for selecting paths.FileSystem Describes the file system identified by a path scheme.FileSystemProvider Provides basic operations on paths for a specific path scheme or schemes.InputStreamSupplier An abstract factory for input streams.IOChannelStatsCollector Gathers statistics for an I/O channel.IOMonitoringContext Provides a context for instrumenting I/O operations.Path An abstract identifier for a resource.PathDetails Describes aPath
along with its metadata.PathGlob SplitInputStream Interface defining an input data stream that works within the boundaries of a defined split.SplitIterator A forward-only iterator over data splits with associated locality information. -
Class Summary Class Description BasicPathDetails BinaryBuilder A buffer for building variable-length binary valued data.BinaryReader Provides extended data access methods on binary data flows.BuiltinStreamProvider Provides access to built-in data streams.CharsetEncoding Describes the encoding format of character data.CompressedFileSplit Describes a range of bytes from a compressed file.CompressionSplitIterator FileClient Provides access to files and directories.FileSplit Describes a range of bytes from a file.FTPFileSystemProvider Provides access to FTP resources as a file system.FTPPath InputStreamSuppliers Contains various factory methods and utilities for creatingInputStreamSupplier
's.LocalFileSystemProvider Provides access to the local file system.Paths A factory for creatingPath
objects.PortRange SFTPFileSystemProvider Provides access to SFTP resources as a file system.SingleSplitIterator A split iterator containing a single split.SplitInputStreamImpl A wrapper for input streams providing windowing behavior.SplitOptions Settings which control the generation of splits on files.SplitReader A character based reader for splits.SplittableCompressedFileSplit Represents a file split for a compression format that supports splitting.UnixStyleGlobbing Provides UNIX-style globbing over paths.UnixStyleGlobbing.GlobDefinition Provides information for performing globbing.URLFileSystemProvider Provides generic access to URL resources. -
Enum Summary Enum Description BasicPathDetails.ObjectType FTPPath.FTPProtocol IOChannelOperation Valid operations on an I/O byte channel, such as a file or network socket.WriteMode Enumerates the possible file dispositions for writing. -
Exception Summary Exception Description EOFException An exception indicating end-of-file has been unexpectedly reached on a stream.FileAlreadyExistsException An I/O exception indicating the file in question already exists.