Interface SplitParsingContext


  • public interface SplitParsingContext
    An object representing the context of a data split parsing operation. Parsers use the context to:
    • Get the split to be parsed
    • Publish parsed records
    • Handle parsing errors which may arise
    • Method Detail

      • getSplit

        DataSplit getSplit()
        Gets the split being parsed.
        Returns:
        the currently parsed split
      • startRecord

        void startRecord​(long offsetInSplit)
        Establishes the context for the current record. Errors messages and published records will be associated with this information.
        Parameters:
        offsetInSplit - the offset within the split at which the record begins. This offset should be either in bytes or characters as appropriate for the format.
      • publishRecord

        void publishRecord()
        Signals that the current record is ready to be published. Field values are set in the buffers provided to DataParser#bindOutput(RecordSettable).
      • discardRecord

        void discardRecord()
        Signals that the current record should be ignored. Field values in the buffers provided to DataParser#bindOutput(RecordSettable) should be discarded.
      • bulkPublish

        void bulkPublish​(RecordTokenSequence records)
        Provides a set of records to publish.

        This method is intended only for column oriented block formats which assemble multiple records at once. For row oriented formats, publishRecord() should be used instead.

        Parameters:
        records -
      • handleFieldError

        void handleFieldError​(String message)
        Reports a field parsing error.
        Parameters:
        message - additional information about the error. The message is interpreted within the context of the current split, so this data need not be included.
      • handleMissingFields

        void handleMissingFields​(String message)
        Reports a record being found to having missing fields.
        Parameters:
        message - additional information about the error. The message is interpreted within the context of the current split, so this data need not be included.
      • handleExtraField

        void handleExtraField​(String message)
        Reports an extra field being found in a record.
        Parameters:
        message - additional information about the error. The message is interpreted within the context of the current split, so this data need not be included.
      • handleParseException

        void handleParseException​(long offsetInSplit,
                                  Exception e)
        Reports an exception occurring during parsing of the split. A parser should invoke this when an exception occurs within DataParser#parseSplit(SplitParsingContext).
        Parameters:
        offsetInSplit - the current offset, in bytes or characters, within the split when the error occurred. The appropriate units for the format should be used.
        e - the exception that occurred