Class PartitionHint

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public final class PartitionHint
    extends AbstractDeferredRecordOperator
    Forces the input data to be partitioned into parallel streams of data for subsequent parallel operations. If requested, the output will also be sorted. It is also possible to force data to be staged to disk, breaking physical streaming; consumers of the output will run in a physical graph executed after the physical graph executing this operator. Even if not forced, the (re)partitioning may cause staging to occur.

    Normally these actions happens automatically in a logical graph as required, but this operator provides a mechanism for explicit control.

    • Constructor Detail

      • PartitionHint

        public PartitionHint()
        Forces a partitioning into parallel streams of data. The resulting streams will be partitioned in an unspecified fashion and be unordered.
      • PartitionHint

        public PartitionHint​(DataDistribution partitioning)
        Forces a partitioning into parallel streams of data guaranteeing the specified distribution. The resulting streams will be unordered.
        Parameters:
        partitioning - the requested data distribution
    • Method Detail

      • getPartitioning

        public DataDistribution getPartitioning()
        Gets the data distribution of the output.
        Returns:
        the requested data distribution
      • setPartitioning

        public void setPartitioning​(DataDistribution partitioning)
        Sets the data distribution of the output. The output is guaranteed to be distributed in the specified manner.
        Parameters:
        partitioning - the requested data distribution
      • getDataOrdering

        public DataOrdering getDataOrdering()
        Gets the data ordering of the output.
        Returns:
        the requested data ordering
      • setDataOrdering

        public void setDataOrdering​(DataOrdering dataOrdering)
        Sets the data ordering of the output. The output is guaranteed to be ordered in the specified manner.
        Parameters:
        dataOrdering - the requested data ordering
      • isForceStaging

        public boolean isForceStaging()
        Indicates whether data will be forcibly staged to disk.
        Returns:
        whether staging is required
      • setForceStaging

        public void setForceStaging​(boolean forceStaging)
        Sets whether data must be staged to disk.
        Parameters:
        forceStaging - indicates whether the data must be staged
      • getPreserveSortOrder

        public boolean getPreserveSortOrder()
        If true, data order will be preserved ( provided that the data order is known ). Note that this can be a very expensive operation ( when running distributed it requires a re-sort following redistribution), so in-general this option should be avoided. If it is known that the downstream operations require data to be sorted, though, it may be advantageous to specify this option because, in the pseudo-distributed case, it is cheaper than performing a re-sort. By default, we do not preserve sort order.
        Returns:
        whether to preserve sort order.
      • setPreserveSortOrder

        public void setPreserveSortOrder​(boolean preserveSortOrder)
        Sets whether data order will be preserved ( provided that the data order is known ). Note that this can be a very expensive operation ( when running distributed it requires a re-sort following redistribution), so in-general this option should be avoided. If it is known that the downstream operations require data to be sorted, though, it may be advantageous to specify this option because, in the pseudo-distributed case, it is cheaper than performing a re-sort. By default, we do not preserve sort order.
        Parameters:
        preserveSortOrder - whether to preserve sort order.