- All Known Implementing Classes:
BasicByteSink
ByteSinks objects
represent entities existing outside of a logical graph,
such as files and sockets, which can be written to as a
stream of bytes. These can then be used in conjunction with
DataFormat objects to consume records flowing
through a dataflow graph, the most common reason being
to persist the data to disk.
Generally, it is not necessarily to implement or even
directly use ByteSink objects. Most write operators
provide a more convenient interface which obscures
the object; see AbstractWriter as an example.
By default, sinks use OS-level authorization inherited from the execution environment, but can be configured to use use more complex authentication mechanisms to provide an authorization context.
-
Method Summary
Modifier and TypeMethodDescriptionbooleanappendsExisting(WriteMode mode) Indicates whether a write in the given mode represents an append to an existing sink.authorize(FileClient client) Creates a new sink with the same properties, but using the specified authorization.voidcleanForOverwrite(int partitionCount) Erases contents of the sink which do not correspond to expected partition fragments.fragmentForPartition(int partitionId) Creates a new subordinate sink.booleanIndicates whether the sink can be broken into subordinate sinks for parallel writes.Opens the sink for writing in the specified mode.Opens the sink for random access.Opens the sink for reading.booleanIndicates whether the sink supports random access.voidPerforms validation of the sink configuration.
-
Method Details
-
authorize
Creates a new sink with the same properties, but using the specified authorization.If a sink is supposed to be used with a specific authorization context, this method should be called to produce a new sink to use.
- Parameters:
client- the authorization context to use for access- Returns:
- a sink using the provided authorization context
-
open
Opens the sink for writing in the specified mode. For some sinks, such as one representing the console, the mode may be irrelevant.- Parameters:
mode- indicates how to handle an existing source- Returns:
- a writer for sending bytes to the sink
- Throws:
IOException- if an I/O error occurs opening the sink
-
openForRead
Opens the sink for reading. This may be needed for appends where there is metadata which must be read. For some sinks, such as one representing standard output, this operation may not be valid. Calls to this method should be guarded with a check ofappendsExisting(WriteMode).- Returns:
- a reader for reading the existing target sink
- Throws:
IOException- if an I/O error occurs opening the sink
-
supportsRandomAccess
boolean supportsRandomAccess()Indicates whether the sink supports random access. In some cases, it may be desirable to perform random writes to the sink.- Returns:
trueif the sink supports random access,falseotherwise.- See Also:
-
openChannel
Opens the sink for random access. Not all sources support random access; check by callingsupportsRandomAccess()first.- Returns:
- a channel for random access to the sink
- Throws:
IOException- if an I/O error occurs opening the sink
-
isFragmentable
boolean isFragmentable()Indicates whether the sink can be broken into subordinate sinks for parallel writes.Typically, this is only true if the sink represents a path in some hierarchical store, such as a file system.
- Returns:
trueif the sink can be "fragmented",falseotherwise.- See Also:
-
fragmentForPartition
Creates a new subordinate sink. The new sink represents a portion (or fragment) of the complete data set which is written to the master.Fragments are used for parallel writes; each logical partition is written to a separate fragment created from the provided source.
- Parameters:
partitionId- a partition identifier- Returns:
- a new sink for writing the identified partition.
-
cleanForOverwrite
Erases contents of the sink which do not correspond to expected partition fragments.This method will be called by one of the nodes during a parallel write in overwrite mode to ensure only the data from the current write is present on completion of the write.
- Parameters:
partitionCount- the number of partitions expected- Throws:
IOException- if an error occurs while attempting to delete extraneous files from the target location
-
appendsExisting
Indicates whether a write in the given mode represents an append to an existing sink.This is used to determine whether file metadata needs to be written or updated. It is not always the case that this can be determined solely from the mode. For instance, an append to a non-existent file needs metadata to be written, but an append to an existing file may need to update metadata. Another example is console output, which should always write metadata, even though the sink always exists.
- Parameters:
mode- the intended write mode- Returns:
trueif the write represents the creation of a new dataset,falseotherwise.
-
validate
Performs validation of the sink configuration. This checks things such as the existence and accessibility of the sink.The caller should indicate how the write will be performed so that appropriate checks can be performed. For instance, if no overwriting is allowed, the target is checked to see if it already exists.
- Parameters:
mode- the intended write modeforFragments- indicates whether the sink will be fragmented for the write- Throws:
IOException- if an I/O error occurs while validating the source
-