Class ReadARFF

  • All Implemented Interfaces:
    LogicalOperator, RecordSourceOperator, SourceOperator<RecordPort>

    public class ReadARFF
    extends AbstractTextReader
    Read files in the Attribute-Relation File Format (ARFF). Files in ARFF can be in either sparse or dense mode. This reader detects the mode and reads the data accordingly. ARFF files contain schema information. The schema is parsed and used to determine how to parse data lines.

    ARFF can be parsed in parallel under "optimistic" assumptions: namely, that parse splits do not occur in the middle of a delimited field value and somewhere before an escaped record separator. This is assumed by default, but can be disabled, with an accompanying reduction of scalability and performance.

    • Constructor Detail

      • ReadARFF

        public ReadARFF()
        Reads an empty source with default settings. The source must be set before execution or an error will be raised.
        See Also:
        AbstractReader.setSource(ByteSource)
      • ReadARFF

        public ReadARFF​(String pattern)
        Reads all paths matching the specified pattern as ARFF data using default options. Any matching path which is a directory is replaced with all files in the directory; this expansion is not applied recursively.
        Parameters:
        pattern - a path-matching pattern
        See Also:
        FileClient.matchPaths(String)
      • ReadARFF

        public ReadARFF​(Path path)
        Reads the file specified by the path as ARFF data using default options. If the path refers to a a directory, all files in the directory are read; this expansion is not applied recursively.
        Parameters:
        path - the path to read
      • ReadARFF

        public ReadARFF​(ByteSource source)
        Reads the specified data source using default options.
        Parameters:
        source - the data source to read
    • Method Detail

      • setFieldDelimiter

        public void setFieldDelimiter​(char fieldDelimiter)
        Set the field delimiter to use when reading the file contents. A single quote is used by default. The only supported values are a single quote and a double quote.
        Parameters:
        fieldDelimiter - character value to use the field delimiter
      • getFieldDelimiter

        public char getFieldDelimiter()
        Get the configured field delimiter property value.
        Returns:
        configured field delimiter
      • discoverMetadata

        public ARFFAnalyzer.Analysis discoverMetadata​(FileClient ctx)
        Gets the metadata for the currently configured data source.
        Parameters:
        ctx - the authorization context to use for accessing the file
        Returns:
        the metadata of the source