public final class WriteDelimitedText extends AbstractTextWriter
Delimited text supports up to three distinct user-defined sequences within a record, used to identify field boundaries:
The writer accepts a RecordTextSchema
to provide formatting information
for fields, as well as header row information, if requested. It is not required
to provide a schema however, as one can be generated from the input
data type, using default formatting based on datatype. The writer also
supports a pluggable discovery mechanism for creating a schema based on the input type,
should fine-grained dynamic control be required.
Any schema, supplied or discovered, must be compatible with the input to the reader.
To be compatible, a schema must contain a field definition with an assignable type
for each field named in the input. Fields present in the schema, but not the input are
permitted with the missing field assuming a null value.
Delimited text data may or may not have a header row. The header row is delimited as usual but contains the names of the fields in the data portion of the record. The writer will emit a header row if it should be present and the write is not appending to an existing file.
encodingProps
input, options
Constructor and Description |
---|
WriteDelimitedText()
Writes delimited text to an empty target with default settings.
|
WriteDelimitedText(boolean provideDoneSignal)
Writes delimited text to an empty target with default settings.
|
WriteDelimitedText(ByteSink target,
WriteMode mode)
Writes delimited text to the specified target sink in the
given mode.
|
WriteDelimitedText(Path path,
WriteMode mode)
Writes delimited text to the specified path in the given mode,
using default settings.
|
WriteDelimitedText(String path,
WriteMode mode)
Writes delimited text to the specified path in the given mode,
using default settings.
|
Modifier and Type | Method and Description |
---|---|
protected DataFormat |
computeFormat(CompositionContext ctx)
Determines the data format for the target.
|
FieldDelimiterSettings |
getDelimiters()
Gets the field delimiter settings used by the writer.
|
String |
getFieldEndDelimiter()
Gets the end of field delimiter.
|
String |
getFieldSeparator()
Returns the delimiter used to distinguish field boundaries.
|
String |
getFieldStartDelimiter()
Returns the start of field delimiter.
|
boolean |
getHeader()
Indicates whether a header row should be written in the target.
|
String |
getLineComment()
Gets the character sequence indicating a line comment.
|
String |
getNullIndicator()
Gets the text value used to represent null values
by default in generated schemas.
|
String |
getRecordSeparator()
Gets the value used as a record separator.
|
RecordTextSchema<?> |
getSchema()
Gets the record schema for delimited text source.
|
TextRecordDiscoverer |
getSchemaDiscovery()
Gets the schema discoverer to use on for the delimited text.
|
void |
setDelimiters(FieldDelimiterSettings settings)
Sets the field delimiter settings for the writer.
|
void |
setFieldDelimiter(String delimiter)
Sets the delimiter used to denote the boundaries of a data field.
|
void |
setFieldEndDelimiter(String delimiter)
Sets the delimiter used to denote the end of a data field.
|
void |
setFieldSeparator(String separator)
Sets the delimiter used to define the boundary between data fields.
|
void |
setFieldStartDelimiter(String delimiter)
Sets the delimiter used to denote the beginning of a data field.
|
void |
setHeader(boolean header)
Configures whether to write a header row in the target.
|
void |
setLineComment(String lineComment)
Sets the character sequence indicating a line comment.
|
void |
setNullIndicator(String value)
Sets the text value used to represent null values
by default in generated schemas.
|
void |
setRecordSeparator(String separator)
Sets the value to use as a record separator.
|
void |
setSchema(RecordTextSchema<?> schema)
Sets the record schema for the delimited text source.
|
void |
setSchemaDiscovery(TextRecordDiscoverer discoverer)
Sets the schema discoverer to use for the delimited text.
|
getCharset, getCharsetName, getEncodeBuffer, getEncoding, getErrorAction, getReplacement, setCharset, setCharsetName, setEncodeBuffer, setEncoding, setErrorAction, setReplacement
compose, getFormatOptions, getInput, getMode, getSaveMetadata, getTarget, getWriteBuffer, getWriteOnClient, getWriteSingleSink, isIgnoreSortOrder, setFormatOptions, setIgnoreSortOrder, setMode, setSaveMetadata, setTarget, setTarget, setTarget, setWriteBuffer, setWriteOnClient, setWriteSingleSink
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public WriteDelimitedText()
A default schema will be generated based on the input
type, unless otherwise configured via
setSchema(RecordTextSchema)
or
setSchemaDiscovery(TextRecordDiscoverer)
.
AbstractWriter.setTarget(ByteSink)
public WriteDelimitedText(boolean provideDoneSignal)
A default schema will be generated based on the input
type, unless otherwise configured via
setSchema(RecordTextSchema)
or
setSchemaDiscovery(TextRecordDiscoverer)
.
provideDoneSignal
- indicates whether a
done signal port should be createdAbstractWriter.setTarget(ByteSink)
public WriteDelimitedText(String path, WriteMode mode)
If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
A default schema will be generated based on the input
type, unless otherwise configured via
setSchema(RecordTextSchema)
or
setSchemaDiscovery(TextRecordDiscoverer)
.
path
- the path to which to writemode
- how to handle existing filespublic WriteDelimitedText(Path path, WriteMode mode)
If the writer is parallelized, this is interpreted as a directory in which each partition will write a fragment of the entire input stream. Otherwise, it is interpreted as the file to write.
A default schema will be generated based on the input
type, unless otherwise configured via
setSchema(RecordTextSchema)
or
setSchemaDiscovery(TextRecordDiscoverer)
.
path
- the path to which to writemode
- how to handle existing filespublic WriteDelimitedText(ByteSink target, WriteMode mode)
The writer can only be parallelized if the sink is fragmentable. In this case, each partition will be written as an independent sink. Otherwise, the writer will run non-parallel.
A default schema will be generated based on the input
type, unless otherwise configured via
setSchema(RecordTextSchema)
or
setSchemaDiscovery(TextRecordDiscoverer)
.
target
- the sink to which to writemode
- how to handle an existing sinkpublic RecordTextSchema<?> getSchema()
null
.public void setSchema(RecordTextSchema<?> schema)
Setting a schema overrides any previously configured schema discovery.
schema
- the desired record schema for the targetsetSchemaDiscovery(TextRecordDiscoverer)
public TextRecordDiscoverer getSchemaDiscovery()
null
.public void setSchemaDiscovery(TextRecordDiscoverer discoverer)
By default, the schema will be discovered automatically. This schema preserves field order from the input and uses default formatting appropriate for the field's datatype. If a header row is written, it will use the field names from the input.
Setting schema discovery overrides any previously configured schema.
discoverer
- the schema discoverer to use.setSchema(RecordTextSchema)
public String getNullIndicator()
public void setNullIndicator(String value)
value
- the string indicating a null valuepublic boolean getHeader()
public void setHeader(boolean header)
header
- indicates whether to write a header rowpublic String getLineComment()
public void setLineComment(String lineComment)
lineComment
- the character sequence marking the
start of a line commentpublic FieldDelimiterSettings getDelimiters()
public void setDelimiters(FieldDelimiterSettings settings)
settings
- the field delimiter settings to usepublic String getRecordSeparator()
public void setRecordSeparator(String separator)
By default the record separator is set to the default record separator for the installed operating system of the execution environment.
separator
- the value to use as a record separatorcom.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the separator is
null
or the empty stringpublic String getFieldSeparator()
public void setFieldSeparator(String separator)
separator
- string used to separate fieldscom.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the delimiter is
null
or the empty stringpublic void setFieldDelimiter(String delimiter)
This method is generally equivalent to calling
setFieldStartDelimiter()
and setFieldEndDelimiter()
with the same parameter values. However, those methods do not
allow the empty string as a parameter.
delimiter
- string used to optionally mark the start and end
of a field value. An empty string indicates field values are not
delimited.com.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the delimiter is
null
public String getFieldStartDelimiter()
public String getFieldEndDelimiter()
public void setFieldStartDelimiter(String delimiter)
setFieldDelimiter(String)
instead to indicate no
delimiters.delimiter
- string used to mark the start of a field valuecom.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the delimiter is
null
or the empty stringpublic void setFieldEndDelimiter(String delimiter)
setFieldDelimiter(String)
instead to indicate no
delimiters.delimiter
- string used to mark the start of a field valuecom.pervasive.datarush.graphs.physical.InvalidPropertyValueException
- if the delimiter is
null
or the empty stringprotected DataFormat computeFormat(CompositionContext ctx)
AbstractWriter
WriteSink
operator. If an
implementation supports schema discovery, it must be
performed in this method.computeFormat
in class AbstractWriter
ctx
- the composition context for the current invocation
of AbstractWriter.compose(CompositionContext)
Copyright © 2024 Actian Corporation. All rights reserved.