public final class WriteSink extends ExecutableOperator implements RecordSinkOperator
This operator is low-level, providing a generalized
model for reading files in a distributed fashion.
Typically, WriteSink
is not directly used in
a graph, instead being indirectly used though a composite
operator such as one derived from AbstractWriter
providing a more appropriate interface to the end user.
Parallelized writes are supported by creating multiple output sinks from the original target, called fragments, each representing the portion of data on one partition. In the case of a write to as file system, this would result in a directory of files. To write in parallel, the target must support the concept of fragmenting. If a target does not, the write is forced to be non-parallel.
A port is provided which signals completion of the writer. This can be used to express a dependency between operators based on the target having been successfully written.
The writer makes a best-effort attempt to validate the target before execution, but cannot always guarantee correctness, depending on the nature of the target. This is done to try to prevent misconfigured graphs from executing where the writer may not execute until a late phase when a failure may result in the loss of a significant amount of work being lost.
Constructor and Description |
---|
WriteSink()
Writes an empty target with default settings.
|
WriteSink(boolean includeDoneSignal)
Writes an empty target with default settings.
|
WriteSink(ByteSink target,
DataFormat format)
Reads the specified target using the given format.
|
Modifier and Type | Method and Description |
---|---|
protected ExecutableOperator |
cloneForExecution()
Performs a deep copy of the operator for execution.
|
protected void |
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contracts
|
protected void |
execute(ExecutionContext ctx)
Executes the operator.
|
DataFormat |
getFormat()
Gets the data format for the configured target.
|
FormattingOptions |
getFormatOptions()
Gets the formatting options used by the writer.
|
boolean |
getIncludeDoneSignal()
Gets the property value for the include done signal setting.
|
RecordPort |
getInput()
Gets the record port providing the records to write to the target sink.
|
WriteMode |
getMode()
Gets how existing files should be handled
by the writer.
|
ByteSink |
getTarget()
Gets the target sink for the reader.
|
boolean |
getWriteOnClient()
Indicates whether the writer should write a file on the client.
|
boolean |
getWriteSingleSink()
Indicates whether the writer produces a
single output file.
|
boolean |
isIgnoreSortOrder()
If set to true, the writer will write in any order.
|
void |
setFormat(DataFormat format)
Sets the data format for the configured target.
|
void |
setFormatOptions(FormattingOptions options)
Sets the formatting options used by the writer.
|
void |
setIgnoreSortOrder(boolean ignoreSortOrder)
If set to true, the writer will write in any order.
|
void |
setMode(WriteMode mode)
Sets how the writer should handle an existing
target.
|
void |
setTarget(ByteSink target)
Sets the target sink for the reader.
|
void |
setWriteOnClient(boolean writeOnClient)
Set whether the writer should write a file on the local client.
|
void |
setWriteSingleSink(boolean enabled)
Set whether the writer should produce a single output
file or multiple ones.
|
getNumInputCopies, getPortSettings, handleInactiveOutput
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public WriteSink()
setTarget(ByteSink)
,
setFormat(DataFormat)
public WriteSink(boolean includeDoneSignal)
setTarget(ByteSink)
,
setFormat(DataFormat)
public WriteSink(ByteSink target, DataFormat format)
target
- the target to writeformat
- the target data formatpublic RecordPort getInput()
getInput
in interface RecordSinkOperator
getInput
in interface SinkOperator<RecordPort>
public boolean getIncludeDoneSignal()
public DataFormat getFormat()
public void setFormat(DataFormat format)
format
- the target data formatpublic ByteSink getTarget()
public void setTarget(ByteSink target)
target
- the target to writepublic FormattingOptions getFormatOptions()
public void setFormatOptions(FormattingOptions options)
options
- the format options to useDataFormat.DataFormatter
public WriteMode getMode()
public void setMode(WriteMode mode)
mode
- the behavior to use for existing
filespublic boolean getWriteSingleSink()
true
if the writer should write
a single output, false
otherwise.public void setWriteSingleSink(boolean enabled)
enabled
- indicates whether a single output file
should be writtenpublic boolean getWriteOnClient()
true
if the writer should a file on the client, false
otherwise.public void setWriteOnClient(boolean writeOnClient)
writeOnClient
- indicates whether the writer should write a file on the client.public boolean isIgnoreSortOrder()
public void setIgnoreSortOrder(boolean ignoreSortOrder)
ignoreSortOrder
- whether to ignore sort order.protected void computeMetadata(StreamingMetadataContext ctx)
StreamingOperator
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.
RecordPort#setRequiredDataDistribution
, otherwise data will arrive in an unspecified partial distribution
.
RecordPort#getSourceDataDistribution
and RecordPort#getSourceDataOrdering
. These should be
viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must
still declare data ordering and data distribution requirements; otherwise there is no guarantee that
data will arrive sorted/distributed as required.
RecordPort#setType
.RecordPort#setOutputDataOrdering
RecordPort#setOutputDataDistribution
AbstractModelPort#setMergeHandler
.MergeModel
is a convenient, re-usable model reducer, parameterized with
a merge-handler.
SimpleModelPort
's have no associated metadata and therefore there is
never any output metadata to declare. PMMLPort
's, on the other hand,
do have associated metadata. For all PMMLPorts, implementations must declare
the following:
PMMLPort.setPMMLModelSpec
.
computeMetadata
in class StreamingOperator
ctx
- the contextprotected void execute(ExecutionContext ctx)
ExecutableOperator
execute
in class ExecutableOperator
ctx
- context in which to lookup physical ports bound to logical portsprotected ExecutableOperator cloneForExecution()
ExecutableOperator
cloneForExecution
in class ExecutableOperator
Copyright © 2020 Actian Corporation. All rights reserved.