com.pervasive.datarush.operators.group.Group

All Implemented Interfaces:: LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public final class Group extends AbstractRecordCompositeOperator

Performs grouping (aggregation) of sorted input data. Available aggregations are provided by the static methods on Aggregation: Aggregation.avg(String), Aggregation.count(String), Aggregation.max(String), Aggregation.min(String), Aggregation.stddev(String), Aggregation.sum(String), Aggregation.var(String).

The operator uses groups of consecutive equal keys ("key groups") to determine which data values to aggregate. The input data need not be sorted; if it is already sorted, performance will be optimal.

Field Summary

Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
input, output
Constructor Summary

Constructors

Constructor

Description

Group()

Default constructor.

Group(List<String> keys, List<Aggregation> aggregations)

Create a new group plan, specifying keys and aggegations.
Method Summary

Modifier and Type

Method

Description

protected void

compose(CompositionContext ctx)

Compose the body of this operator.

Aggregation[]

getAggregations()

Returns the aggregations to apply to the data.

int

getInitialGroupCapacity()

Returns a hint as to the number of groups that are expected to be processed.

RecordPort

getInput()

Returns the input port

String

getKeyFieldPrefix()

Returns the prefix to add to key fields.

String[]

getKeys()

Returns the names of the key fields.

int

getMaxGroupCapacity()

Returns the max number of groups to fit into internal memory buffers.

RecordPort

getOutput()

Returns the output port

boolean

isFewGroupsHint()

Provides a hint as to whether the number of groups is expected to be small.

void

setAggregations(Aggregation[] aggregations)

Sets the aggregations to apply to the data.

void

setAggregations(String aggregationExpression)

Sets the aggregations to apply to the data.

void

setFewGroupsHint(boolean fewGroupsHint)

Sets a hint as to whether the number of groups is expected to be small.

void

setInitialGroupCapacity(int initialGroupCapacity)

Sets a hint as to the number of groups that are expected to be processed.

void

setKeyFieldPrefix(String keyFieldPrefix)

Sets the prefix to add to key fields.

void

setKeys(String[] keys)

Sets the names of the key fields.

void

setMaxGroupCapacity(int maxGroupCapacity)

Sets the max number of groups to fit into internal memory buffers.

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts

Constructor Details
- Group
  
  public Group()
  Default constructor. Prior to graph compilation at least one of the following properties must be set:
  
  setKeys(String[])
  
  setAggregations(Aggregation[])
- Group
  
  public Group(List<String> keys, List<Aggregation> aggregations)
  
  Create a new group plan, specifying keys and aggegations.
  
  Parameters:
  
  keys - the names of the key fields. If empty then all of the rows in the input are treated as one group.
  
  aggregations - the aggregations to apply to the data.
Method Details
- getInput
  
  public RecordPort getInput()
  
  Description copied from interface: PipelineOperator
  
  Returns the input port
  
  Specified by:
  
  getInput in interface PipelineOperator<RecordPort>
  
  Overrides:
  
  getInput in class AbstractRecordCompositeOperator
  
  Returns:
  
  the input port
- getOutput
  
  public RecordPort getOutput()
  
  Description copied from interface: PipelineOperator
  
  Returns the output port
  
  Specified by:
  
  getOutput in interface PipelineOperator<RecordPort>
  
  Overrides:
  
  getOutput in class AbstractRecordCompositeOperator
  
  Returns:
  
  the output port
- getKeys
  
  public String[] getKeys()
  
  Returns the names of the key fields. If empty then all of the rows in the input are treated as one group.
  
  Returns:
  
  names of the group key fields
- setKeys
  
  public void setKeys(String[] keys)
  
  Sets the names of the key fields. If empty then all of the rows in the input are treated as one group.
  
  Parameters:
  
  keys - names of the fields to use as group keys (order is important)
- getAggregations
  
  public Aggregation[] getAggregations()
  
  Returns the aggregations to apply to the data.
  
  Returns:
  
  aggregations to apply to the input data
- setAggregations
  
  public void setAggregations(Aggregation[] aggregations)
  
  Sets the aggregations to apply to the data.
  
  Parameters:
  
  aggregations - the aggregations to apply to the input data
- setAggregations
  
  public void setAggregations(String aggregationExpression)
  Sets the aggregations to apply to the data. The aggregations to apply are expressed using a SQL-like syntax. Multiple aggregations can be contained within one expression. Multiple aggregations should be separated by a comma.
  Each aggregation contains the aggregation function with the aggregation parameters contained within parentheses. Most aggregations take only one parameter: the name of the field to aggregate. The distinct keyword can be used before the first parameter. Doing so enables a distinct aggregation. Use the as keyword to specify the field name that should be used for the resultant aggregation.
  Examples of aggregation expressions:
  
  count(field) as "count"
  
  sum(field), avg(field), min(field), max(field)
  
  Note that double quotes must be used when a keyword such as an aggregation function name is used as the resultant field name (the as clause).
  Parameters:
  
  aggregationExpression - expression containing aggregations to apply
- getKeyFieldPrefix
  
  public String getKeyFieldPrefix()
  
  Returns the prefix to add to key fields. Defaults to "".
  
  Returns:
  
  the prefix to add to key fields.
- setKeyFieldPrefix
  
  public void setKeyFieldPrefix(String keyFieldPrefix)
  
  Sets the prefix to add to key fields. Defaults to "".
  
  Parameters:
  
  keyFieldPrefix - key field prefix text
- getInitialGroupCapacity
  
  public int getInitialGroupCapacity()
  
  Returns a hint as to the number of groups that are expected to be processed. When input data is unsorted, we will optimistically buffer input rows in order to attempt to reduce the amount of data to be sorted. This setting is ignore if input data is already sorted.
  
  Returns:
  
  initial group capacity
- setInitialGroupCapacity
  
  public void setInitialGroupCapacity(int initialGroupCapacity)
  
  Sets a hint as to the number of groups that are expected to be processed. When input data is unsorted, we will optimistically buffer input rows in order to attempt to reduce the amount of data to be sorted. This setting is ignored if input data is already sorted.
  
  Parameters:
  
  initialGroupCapacity - the initial capacity
- getMaxGroupCapacity
  
  public int getMaxGroupCapacity()
  
  Returns the max number of groups to fit into internal memory buffers. A value of zero means that this will grow unbounded. This setting is ignored if input data is already sorted.
  
  Returns:
  
  the max capacity.
- setMaxGroupCapacity
  
  public void setMaxGroupCapacity(int maxGroupCapacity)
  
  Sets the max number of groups to fit into internal memory buffers. A value of zero means that this will grow unbounded. This setting is ignored if input data is already sorted.
  
  Parameters:
  
  maxGroupCapacity - the max capacity.
- isFewGroupsHint
  
  public boolean isFewGroupsHint()
  
  Provides a hint as to whether the number of groups is expected to be small. If so, the reduction step is performed as a non-parallel operation. By default this is false.
  
  Returns:
  
  a hint as to whether the number of groups is expected to be small
- setFewGroupsHint
  
  public void setFewGroupsHint(boolean fewGroupsHint)
  
  Sets a hint as to whether the number of groups is expected to be small. If so, the reduction step is performed as a non-parallel operation. By default this is false.
  
  Parameters:
  
  fewGroupsHint - a hint as to whether the number of groups is expected to be small
- compose
  
  protected void compose(CompositionContext ctx)
  
  Description copied from class: CompositeOperator
  Compose the body of this operator. Implementations should do the following:
  
  Perform any validation of configuration, input types, etc
  
  Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
  
  Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
  Specified by:
  
  compose in class CompositeOperator
  
  Parameters:
  
  ctx - the context

Class Group

Field Summary

Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator

Constructor Details

Group

Group

Method Details

getInput

getOutput

getKeys

setKeys

getAggregations

setAggregations

setAggregations

getKeyFieldPrefix

setKeyFieldPrefix

getInitialGroupCapacity

setInitialGroupCapacity

getMaxGroupCapacity

setMaxGroupCapacity

isFewGroupsHint

setFewGroupsHint

compose