Class PartitionHint

All Implemented Interfaces:
LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public final class PartitionHint extends AbstractDeferredRecordOperator
Forces the input data to be partitioned into parallel streams of data for subsequent parallel operations. If requested, the output will also be sorted. It is also possible to force data to be staged to disk, breaking physical streaming; consumers of the output will run in a physical graph executed after the physical graph executing this operator. Even if not forced, the (re)partitioning may cause staging to occur.

Normally these actions happens automatically in a logical graph as required, but this operator provides a mechanism for explicit control.

  • Constructor Details

    • PartitionHint

      public PartitionHint()
      Forces a partitioning into parallel streams of data. The resulting streams will be partitioned in an unspecified fashion and be unordered.
    • PartitionHint

      public PartitionHint(DataDistribution partitioning)
      Forces a partitioning into parallel streams of data guaranteeing the specified distribution. The resulting streams will be unordered.
      Parameters:
      partitioning - the requested data distribution
  • Method Details

    • getPartitioning

      public DataDistribution getPartitioning()
      Gets the data distribution of the output.
      Returns:
      the requested data distribution
    • setPartitioning

      public void setPartitioning(DataDistribution partitioning)
      Sets the data distribution of the output. The output is guaranteed to be distributed in the specified manner.
      Parameters:
      partitioning - the requested data distribution
    • getDataOrdering

      public DataOrdering getDataOrdering()
      Gets the data ordering of the output.
      Returns:
      the requested data ordering
    • setDataOrdering

      public void setDataOrdering(DataOrdering dataOrdering)
      Sets the data ordering of the output. The output is guaranteed to be ordered in the specified manner.
      Parameters:
      dataOrdering - the requested data ordering
    • isForceStaging

      public boolean isForceStaging()
      Indicates whether data will be forcibly staged to disk.
      Returns:
      whether staging is required
    • setForceStaging

      public void setForceStaging(boolean forceStaging)
      Sets whether data must be staged to disk.
      Parameters:
      forceStaging - indicates whether the data must be staged
    • getPreserveSortOrder

      public boolean getPreserveSortOrder()
      If true, data order will be preserved ( provided that the data order is known ). Note that this can be a very expensive operation ( when running distributed it requires a re-sort following redistribution), so in-general this option should be avoided. If it is known that the downstream operations require data to be sorted, though, it may be advantageous to specify this option because, in the pseudo-distributed case, it is cheaper than performing a re-sort. By default, we do not preserve sort order.
      Returns:
      whether to preserve sort order.
    • setPreserveSortOrder

      public void setPreserveSortOrder(boolean preserveSortOrder)
      Sets whether data order will be preserved ( provided that the data order is known ). Note that this can be a very expensive operation ( when running distributed it requires a re-sort following redistribution), so in-general this option should be avoided. If it is known that the downstream operations require data to be sorted, though, it may be advantageous to specify this option because, in the pseudo-distributed case, it is cheaper than performing a re-sort. By default, we do not preserve sort order.
      Parameters:
      preserveSortOrder - whether to preserve sort order.
    • computeMetadata

      protected void computeMetadata(StreamingMetadataContext ctx)
      Description copied from class: DeferredCompositeOperator
      Implementations must adhere to all of the contracts specified by StreamingOperator.computeMetadata(com.pervasive.datarush.operators.StreamingMetadataContext). In addition, DeferredCompositeOperators must declare required metadata so as to satisfy requirements of the operators that are added during DeferredCompositeOperator.compose(com.pervasive.datarush.operators.DeferredCompositionContext).
      Specified by:
      computeMetadata in class DeferredCompositeOperator
      Parameters:
      ctx - the context
    • compose

      protected void compose(DeferredCompositionContext ctx)
      Description copied from class: DeferredCompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      2. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class DeferredCompositeOperator
      Parameters:
      ctx - the context