- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
The table below provides an example set of data that we want to pivot by Region. There are only four regions: North, South, East and West. For each item, we'd like to compute the total sales per region. Items can show up multiple times in a region, as the data is also organized by store.
| ItemID | StoreID | Region | Sales |
|---|---|---|---|
| 1 | 10 | North | 1000 |
| 1 | 15 | North | 800 |
| 1 | 20 | South | 500 |
| 1 | 30 | East | 700 |
| 2 | 40 | West | 1200 |
| 2 | 10 | North | 500 |
| 2 | 15 | North | 200 |
To accomplish this pivot, the ItemID will be used as the group key. The Region will be used as the pivot key. And the Sales column will be the pivot value, aggregating by summing the values. The pivot key values are: "North", "South", "East" and "West". The result of the pivot is shown in the table below. Note that the sales total for the West region for item 1 is empty. Scanning the input data shows that no sales were present in the West region for item 1. Item 1 did have two sales values for the North region. Those values (1000 and 800) are summed and the total (1800) appears in the North region column for item 1. Column values of ? indicate a null (non-existing) value.
| ItemID | North | South | East | West |
|---|---|---|---|---|
| 1 | 1800 | 500 | 700 | ? |
| 2 | 700 | ? | ? | 1200 |
The key concepts to understand usage of the RowsToColumns operator are:
- The use of a set of columns to segment the data into groups for pivoting. This set of column is called the group keys. The example used the ItemID as the group key.
- A categorical valued column whose distinct values are used as columns in the output. This is the pivot key. The example used the Region as the pivot key.
- A column that can be aggregated for each data grouping and within each pivot key. These are the pivot values. The example used the Sales column as the pivot value.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.Get the aggregation(s) applied to each pivot group.Get the list of fields used to segment the input data into groups.getInput()Returns the input portReturns the output portGet the name of the field used as the pivot key.Get the list of distinct values of the pivot key field.voidsetAggregations(Aggregation... aggregations) Set the aggregation(s) to apply to each pivot grouping.voidsetAggregations(String aggregationExpression) Set the aggregation(s) to apply to each pivot grouping using an aggregation expression.voidsetAggregations(List<Aggregation> aggregations) Set the aggregation(s) to apply to each pivot grouping.voidsetGroupKeys(String... groupKeys) Set the list of fields used to segment the input data into groups for pivoting.voidsetGroupKeys(List<String> groupKeys) Set the list of fields used to segment the input data into groups for pivoting.voidsetPivotColumnPattern(String pattern) Sets the naming pattern used for new pivot columns.voidsetPivotKey(String pivotKey) Set the name of the field used as the pivot key.voidsetPivotKeyValues(String... pivotValues) Set the list of distinct pivot key values.voidsetPivotKeyValues(List<String> pivotValues) Set the list of distinct pivot key values.Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Field Details
-
DEFAULTCOLUMNPATTERN
- See Also:
-
-
Constructor Details
-
RowsToColumns
public RowsToColumns()
-
-
Method Details
-
getInput
Description copied from interface:PipelineOperatorReturns the input port- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Returns:
- the input port
-
getOutput
Description copied from interface:PipelineOperatorReturns the output port- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Returns:
- the output port
-
getGroupKeys
Get the list of fields used to segment the input data into groups.- Returns:
- ordered list of grouping field names
-
setGroupKeys
Set the list of fields used to segment the input data into groups for pivoting.- Parameters:
groupKeys- ordered list of grouping field names
-
setGroupKeys
Set the list of fields used to segment the input data into groups for pivoting.- Parameters:
groupKeys- ordered list of grouping field names
-
getPivotKey
Get the name of the field used as the pivot key.- Returns:
- name of pivot key field
-
setPivotKey
Set the name of the field used as the pivot key. This field is used as the pivot point. Distinct values from this field are used to determine the output fields of the pivot.- Parameters:
pivotKey- name of the pivot key field
-
getAggregations
Get the aggregation(s) applied to each pivot group.- Returns:
- aggregations applied to pivot groups
-
setAggregations
Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations- aggregation(s) to apply to each pivot data group
-
setAggregations
Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations- aggregation(s) to apply to each pivot data group
-
setAggregations
Set the aggregation(s) to apply to each pivot grouping using an aggregation expression. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations- aggregation(s) to apply to each pivot data group
-
getPivotKeyValues
Get the list of distinct values of the pivot key field.- Returns:
- distinct pivot key values
-
setPivotKeyValues
Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.- Parameters:
pivotValues- list of distinct pivot key values
-
setPivotKeyValues
Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.- Parameters:
pivotValues- list of distinct pivot key values
-
setPivotColumnPattern
Sets the naming pattern used for new pivot columns. This will be used to determine the actual names of the pivot columns.- Parameters:
pattern- name pattern for pivot columns
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-