- All Known Implementing Classes:
ARFFDataFormat,AvroFormat,BinaryFormat,DelimitedTextFormat,FixedTextFormat,JSONFormat,LogDataFormat,MDFFormat,ORCFormat,ParquetFormat
DataFormat object provides
the necessary information for reading and
writing external data, converting it to
and from records in a dataflow graph.
Many formats are predefined in the library;
an implementation is only required if a new
format needs to be defined. Normally, it
is not necessary to work directly with
formats. Instead, operators are provided
which hide the DataFormat object
and present a view more appropriate to
the specific format. Examples of this technique are
the ReadDelimitedText and WriteDelimitedText
operators.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interfaceA formatter for converting record data to binary or text format.static interfaceA parser for record data in binary or text format. -
Method Summary
Modifier and TypeMethodDescriptioncreateParser(ParsingOptions options) Create a new parser for the format using the specified parsing options.createWriter(FormattingOptions options) Create a new writer for the format using the specified formatting options.Gets the metadata associated with the format.getType()Gets the record type associated with the format.booleanIndicates if the format supports parsing of subsections of a file.readMetadata(FileClient fileClient, ByteSource source) Reads the metadata associated with the format.voidsetMetadata(FileMetadata metadata) Sets the metadata associated with the format.voidwriteMetadata(FileMetadata metadata, FileClient fileClient, ByteSink target) Writes the provided metadata associated with the format.
-
Method Details
-
getType
RecordTokenType getType()Gets the record type associated with the format. Records produced by the associated parser or consumed by the associated formatter will be of this type.For many formats, this may be derived from a schema object describing the format layout.
- Returns:
- the format's record type
-
getMetadata
FileMetadata getMetadata()Gets the metadata associated with the format. Records produces by the associated parser or consumed by the associated formatter will use this metadata.- Returns:
- the format's metadata
-
setMetadata
Sets the metadata associated with the format. -
readMetadata
Reads the metadata associated with the format.- Parameters:
fileClient- client used to read filesource- location of the files
-
writeMetadata
Writes the provided metadata associated with the format.- Parameters:
metadata- the metadata to writefileClient- client used to write filesource- location of the files
-
createParser
Create a new parser for the format using the specified parsing options.- Parameters:
options- parsing options to use- Returns:
- a new parser for reading external data
-
createWriter
Create a new writer for the format using the specified formatting options.- Parameters:
options- formatting options to use- Returns:
- a new formatter for writing external data
-
isSplittable
boolean isSplittable()Indicates if the format supports parsing of subsections of a file.A format should only return
trueif it can, at least in some situations, support this sort of parsing. If a format requires reading the entire file, it must returnfalse.If a format is not splittable, a file in the format cannot be parsed in parallel; however, individual files can still be parsed independently in parallel, as when reading the contents of a directory or using a file globbing pattern.
- Returns:
trueif the format supports parsing only a portion of the file,falseotherwise
-