com.pervasive.datarush.operators.record.RowsToColumns

All Implemented Interfaces:: LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public class RowsToColumns extends CompositeOperator implements RecordPipelineOperator

The RowsToColumns operator is used to pivot data from a narrow representation (rows) into a wider representation (columns). The data is first segmented into groups using a defined set of group keys. The ordering of the group keys is important as it defines how the data is partitioned and ordered for the pivot operation. A pivot key field contains the distinct values that will be used as the pivot point. This field must be typed as a string. A column is created in the output for each distinct value of the pivot key. An aggregation is defined which is performed on each data grouping defined by the group field(s) and the pivot field. The result of the aggregation for each unique value of the pivot field appears in the appropriate output column.

The table below provides an example set of data that we want to pivot by Region. There are only four regions: North, South, East and West. For each item, we'd like to compute the total sales per region. Items can show up multiple times in a region, as the data is also organized by store.

ItemID	StoreID	Region	Sales
1	10	North	1000
1	15	North	800
1	20	South	500
1	30	East	700
2	40	West	1200
2	10	North	500
2	15	North	200

To accomplish this pivot, the ItemID will be used as the group key. The Region will be used as the pivot key. And the Sales column will be the pivot value, aggregating by summing the values. The pivot key values are: "North", "South", "East" and "West". The result of the pivot is shown in the table below. Note that the sales total for the West region for item 1 is empty. Scanning the input data shows that no sales were present in the West region for item 1. Item 1 did have two sales values for the North region. Those values (1000 and 800) are summed and the total (1800) appears in the North region column for item 1. Column values of ? indicate a null (non-existing) value.

ItemID	North	South	East	West
1	1800	500	700	?
2	700	?	?	1200

The key concepts to understand usage of the RowsToColumns operator are:

The use of a set of columns to segment the data into groups for pivoting. This set of column is called the group keys. The example used the ItemID as the group key.
A categorical valued column whose distinct values are used as columns in the output. This is the pivot key. The example used the Region as the pivot key.
A column that can be aggregated for each data grouping and within each pivot key. These are the pivot values. The example used the Sales column as the pivot value.

Field Summary

Fields

Modifier and Type

Field

Description

static final String

DEFAULTCOLUMNPATTERN
Constructor Summary

Constructors

Constructor

Description

RowsToColumns()
Method Summary

Modifier and Type

Method

Description

protected void

compose(CompositionContext ctx)

Compose the body of this operator.

List<Aggregation>

getAggregations()

Get the aggregation(s) applied to each pivot group.

List<String>

getGroupKeys()

Get the list of fields used to segment the input data into groups.

RecordPort

getInput()

Returns the input port

RecordPort

getOutput()

Returns the output port

String

getPivotKey()

Get the name of the field used as the pivot key.

List<String>

getPivotKeyValues()

Get the list of distinct values of the pivot key field.

void

setAggregations(Aggregation... aggregations)

Set the aggregation(s) to apply to each pivot grouping.

void

setAggregations(String aggregationExpression)

Set the aggregation(s) to apply to each pivot grouping using an aggregation expression.

void

setAggregations(List<Aggregation> aggregations)

Set the aggregation(s) to apply to each pivot grouping.

void

setGroupKeys(String... groupKeys)

Set the list of fields used to segment the input data into groups for pivoting.

void

setGroupKeys(List<String> groupKeys)

Set the list of fields used to segment the input data into groups for pivoting.

void

setPivotColumnPattern(String pattern)

Sets the naming pattern used for new pivot columns.

void

setPivotKey(String pivotKey)

Set the name of the field used as the pivot key.

void

setPivotKeyValues(String... pivotValues)

Set the list of distinct pivot key values.

void

setPivotKeyValues(List<String> pivotValues)

Set the list of distinct pivot key values.

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts

Field Details
- DEFAULTCOLUMNPATTERN
  
  public static final String DEFAULTCOLUMNPATTERN
  See Also:
  
  Constant Field Values
Constructor Details
- RowsToColumns
  
  public RowsToColumns()
Method Details
- getInput
  
  public RecordPort getInput()
  
  Description copied from interface: PipelineOperator
  
  Returns the input port
  
  Specified by:
  
  getInput in interface PipelineOperator<RecordPort>
  
  Returns:
  
  the input port
- getOutput
  
  public RecordPort getOutput()
  
  Description copied from interface: PipelineOperator
  
  Returns the output port
  
  Specified by:
  
  getOutput in interface PipelineOperator<RecordPort>
  
  Returns:
  
  the output port
- getGroupKeys
  
  public List<String> getGroupKeys()
  
  Get the list of fields used to segment the input data into groups.
  
  Returns:
  
  ordered list of grouping field names
- setGroupKeys
  
  public void setGroupKeys(List<String> groupKeys)
  
  Set the list of fields used to segment the input data into groups for pivoting.
  
  Parameters:
  
  groupKeys - ordered list of grouping field names
- setGroupKeys
  
  public void setGroupKeys(String... groupKeys)
  
  Set the list of fields used to segment the input data into groups for pivoting.
  
  Parameters:
  
  groupKeys - ordered list of grouping field names
- getPivotKey
  
  public String getPivotKey()
  
  Get the name of the field used as the pivot key.
  
  Returns:
  
  name of pivot key field
- setPivotKey
  
  public void setPivotKey(String pivotKey)
  
  Set the name of the field used as the pivot key. This field is used as the pivot point. Distinct values from this field are used to determine the output fields of the pivot.
  
  Parameters:
  
  pivotKey - name of the pivot key field
- getAggregations
  
  public List<Aggregation> getAggregations()
  
  Get the aggregation(s) applied to each pivot group.
  
  Returns:
  
  aggregations applied to pivot groups
- setAggregations
  
  public void setAggregations(List<Aggregation> aggregations)
  
  Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.
  
  Parameters:
  
  aggregations - aggregation(s) to apply to each pivot data group
- setAggregations
  
  public void setAggregations(Aggregation... aggregations)
  
  Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.
  
  Parameters:
  
  aggregations - aggregation(s) to apply to each pivot data group
- setAggregations
  
  public void setAggregations(String aggregationExpression)
  
  Set the aggregation(s) to apply to each pivot grouping using an aggregation expression. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.
  
  Parameters:
  
  aggregations - aggregation(s) to apply to each pivot data group
- getPivotKeyValues
  
  public List<String> getPivotKeyValues()
  
  Get the list of distinct values of the pivot key field.
  
  Returns:
  
  distinct pivot key values
- setPivotKeyValues
  
  public void setPivotKeyValues(List<String> pivotValues)
  
  Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.
  
  Parameters:
  
  pivotValues - list of distinct pivot key values
- setPivotKeyValues
  
  public void setPivotKeyValues(String... pivotValues)
  
  Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.
  
  Parameters:
  
  pivotValues - list of distinct pivot key values
- setPivotColumnPattern
  
  public void setPivotColumnPattern(String pattern)
  
  Sets the naming pattern used for new pivot columns. This will be used to determine the actual names of the pivot columns.
  
  Parameters:
  
  pattern - name pattern for pivot columns
- compose
  
  protected void compose(CompositionContext ctx)
  
  Description copied from class: CompositeOperator
  Compose the body of this operator. Implementations should do the following:
  
  Perform any validation of configuration, input types, etc
  
  Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
  
  Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
  Specified by:
  
  compose in class CompositeOperator
  
  Parameters:
  
  ctx - the context

Class RowsToColumns

Field Summary

Constructor Summary

Method Summary

Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator

Methods inherited from class java.lang.Object

Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator

Field Details

DEFAULTCOLUMNPATTERN

Constructor Details

RowsToColumns

Method Details

getInput

getOutput

getGroupKeys

setGroupKeys

setGroupKeys

getPivotKey

setPivotKey

getAggregations

setAggregations

setAggregations

setAggregations

getPivotKeyValues

setPivotKeyValues

setPivotKeyValues

setPivotColumnPattern

compose