All Implemented Interfaces:
LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public final class Group extends AbstractRecordCompositeOperator
Performs grouping (aggregation) of sorted input data. Available aggregations are provided by the static methods on Aggregation: Aggregation.avg(String), Aggregation.count(String), Aggregation.max(String), Aggregation.min(String), Aggregation.stddev(String), Aggregation.sum(String), Aggregation.var(String).

The operator uses groups of consecutive equal keys ("key groups") to determine which data values to aggregate. The input data need not be sorted; if it is already sorted, performance will be optimal.

  • Constructor Details

    • Group

      public Group()
      Default constructor. Prior to graph compilation at least one of the following properties must be set:
    • Group

      public Group(List<String> keys, List<Aggregation> aggregations)
      Create a new group plan, specifying keys and aggegations.
      Parameters:
      keys - the names of the key fields. If empty then all of the rows in the input are treated as one group.
      aggregations - the aggregations to apply to the data.
  • Method Details

    • getInput

      public RecordPort getInput()
      Description copied from interface: PipelineOperator
      Returns the input port
      Specified by:
      getInput in interface PipelineOperator<RecordPort>
      Overrides:
      getInput in class AbstractRecordCompositeOperator
      Returns:
      the input port
    • getOutput

      public RecordPort getOutput()
      Description copied from interface: PipelineOperator
      Returns the output port
      Specified by:
      getOutput in interface PipelineOperator<RecordPort>
      Overrides:
      getOutput in class AbstractRecordCompositeOperator
      Returns:
      the output port
    • getKeys

      public String[] getKeys()
      Returns the names of the key fields. If empty then all of the rows in the input are treated as one group.
      Returns:
      names of the group key fields
    • setKeys

      public void setKeys(String[] keys)
      Sets the names of the key fields. If empty then all of the rows in the input are treated as one group.
      Parameters:
      keys - names of the fields to use as group keys (order is important)
    • getAggregations

      public Aggregation[] getAggregations()
      Returns the aggregations to apply to the data.
      Returns:
      aggregations to apply to the input data
    • setAggregations

      public void setAggregations(Aggregation[] aggregations)
      Sets the aggregations to apply to the data.
      Parameters:
      aggregations - the aggregations to apply to the input data
    • setAggregations

      public void setAggregations(String aggregationExpression)
      Sets the aggregations to apply to the data. The aggregations to apply are expressed using a SQL-like syntax. Multiple aggregations can be contained within one expression. Multiple aggregations should be separated by a comma.

      Each aggregation contains the aggregation function with the aggregation parameters contained within parentheses. Most aggregations take only one parameter: the name of the field to aggregate. The distinct keyword can be used before the first parameter. Doing so enables a distinct aggregation. Use the as keyword to specify the field name that should be used for the resultant aggregation.

      Examples of aggregation expressions:

      • count(field) as "count"
      • sum(field), avg(field), min(field), max(field)

      Note that double quotes must be used when a keyword such as an aggregation function name is used as the resultant field name (the as clause).

      Parameters:
      aggregationExpression - expression containing aggregations to apply
    • getKeyFieldPrefix

      public String getKeyFieldPrefix()
      Returns the prefix to add to key fields. Defaults to "".
      Returns:
      the prefix to add to key fields.
    • setKeyFieldPrefix

      public void setKeyFieldPrefix(String keyFieldPrefix)
      Sets the prefix to add to key fields. Defaults to "".
      Parameters:
      keyFieldPrefix - key field prefix text
    • getInitialGroupCapacity

      public int getInitialGroupCapacity()
      Returns a hint as to the number of groups that are expected to be processed. When input data is unsorted, we will optimistically buffer input rows in order to attempt to reduce the amount of data to be sorted. This setting is ignore if input data is already sorted.
      Returns:
      initial group capacity
    • setInitialGroupCapacity

      public void setInitialGroupCapacity(int initialGroupCapacity)
      Sets a hint as to the number of groups that are expected to be processed. When input data is unsorted, we will optimistically buffer input rows in order to attempt to reduce the amount of data to be sorted. This setting is ignored if input data is already sorted.
      Parameters:
      initialGroupCapacity - the initial capacity
    • getMaxGroupCapacity

      public int getMaxGroupCapacity()
      Returns the max number of groups to fit into internal memory buffers. A value of zero means that this will grow unbounded. This setting is ignored if input data is already sorted.
      Returns:
      the max capacity.
    • setMaxGroupCapacity

      public void setMaxGroupCapacity(int maxGroupCapacity)
      Sets the max number of groups to fit into internal memory buffers. A value of zero means that this will grow unbounded. This setting is ignored if input data is already sorted.
      Parameters:
      maxGroupCapacity - the max capacity.
    • isFewGroupsHint

      public boolean isFewGroupsHint()
      Provides a hint as to whether the number of groups is expected to be small. If so, the reduction step is performed as a non-parallel operation. By default this is false.
      Returns:
      a hint as to whether the number of groups is expected to be small
    • setFewGroupsHint

      public void setFewGroupsHint(boolean fewGroupsHint)
      Sets a hint as to whether the number of groups is expected to be small. If so, the reduction step is performed as a non-parallel operation. By default this is false.
      Parameters:
      fewGroupsHint - a hint as to whether the number of groups is expected to be small
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context