Interface SplitIterator

All Known Implementing Classes:
CompressionSplitIterator, SingleSplitIterator

public interface SplitIterator
A forward-only iterator over data splits with associated locality information. The provided locality information can be used to help decide how to assign splits to cluster nodes for processing.
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    Get the list of machines for which access to the current split is local.
    Get the current split in the iterated set.
    boolean
    Advance to the next data split in the iterated set.
  • Method Details

    • next

      boolean next() throws IOException
      Advance to the next data split in the iterated set.
      Returns:
      true if there is another split, otherwise false.
      Throws:
      IOException
    • getSplit

      DataSplit getSplit()
      Get the current split in the iterated set.
      Returns:
      the split currently selected
    • getLocalityInfo

      List<String> getLocalityInfo()
      Get the list of machines for which access to the current split is local. An empty list indicates the split is local to no node (or all nodes).
      Returns:
      the machines where the currently selected split is local. If the split is local to no machine (or to all), this list is empty.