Package com.pervasive.datarush.io

Provides classes and interfaces performing file-like I/O operations. DataRush uses model for file access based on the concept of paths, a generic "file system" location. Paths are represented in a URI-like fashion, having a scheme and a scheme-specific component. This abstraction provides a consistent mechanism for interacting with data which can be extended so that existing operators can automatically support many different sources of data.

There are three main components to this model:

  • Paths, which as mentioned above, refer to locations where data resides. Paths are merely syntactic entities; path objects to not expose access to the reference data directly. Paths may be syntactically valid without pointing to an existing location.
  • File systems which represent a logical storage location. Every path has an associated file system; a file system may have many paths associated with it.
  • File system providers provide the means of accessing data located on a file system via a path. Every file system has an associated provider; a provider may have multiple file systems associated with it.
  • Of these items, users only need to be aware of paths and the utility classes surrounding them. File systems and file system providers are implementation-specific and only required when developing support for a new type of path.

    Additionally, the DataRush model has the concept of a file split, similar to that found in Hadoop. Splits are used to parallelize processing on files. File system providers should provide support for splitting files when possible to get parallelism when reading files.