- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.record.RowsToColumns
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class RowsToColumns extends CompositeOperator implements RecordPipelineOperator
The RowsToColumns operator is used to pivot data from a narrow representation (rows) into a wider representation (columns). The data is first segmented into groups using a defined set of group keys. The ordering of the group keys is important as it defines how the data is partitioned and ordered for the pivot operation. A pivot key field contains the distinct values that will be used as the pivot point. This field must be typed as a string. A column is created in the output for each distinct value of the pivot key. An aggregation is defined which is performed on each data grouping defined by the group field(s) and the pivot field. The result of the aggregation for each unique value of the pivot field appears in the appropriate output column.The table below provides an example set of data that we want to pivot by Region. There are only four regions: North, South, East and West. For each item, we'd like to compute the total sales per region. Items can show up multiple times in a region, as the data is also organized by store.
ItemID StoreID Region Sales 1 10 North 1000 1 15 North 800 1 20 South 500 1 30 East 700 2 40 West 1200 2 10 North 500 2 15 North 200 To accomplish this pivot, the ItemID will be used as the group key. The Region will be used as the pivot key. And the Sales column will be the pivot value, aggregating by summing the values. The pivot key values are: "North", "South", "East" and "West". The result of the pivot is shown in the table below. Note that the sales total for the West region for item 1 is empty. Scanning the input data shows that no sales were present in the West region for item 1. Item 1 did have two sales values for the North region. Those values (1000 and 800) are summed and the total (1800) appears in the North region column for item 1. Column values of ? indicate a null (non-existing) value.
ItemID North South East West 1 1800 500 700 ? 2 700 ? ? 1200 The key concepts to understand usage of the RowsToColumns operator are:
- The use of a set of columns to segment the data into groups for pivoting. This set of column is called the group keys. The example used the ItemID as the group key.
- A categorical valued column whose distinct values are used as columns in the output. This is the pivot key. The example used the Region as the pivot key.
- A column that can be aggregated for each data grouping and within each pivot key. These are the pivot values. The example used the Sales column as the pivot value.
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULTCOLUMNPATTERN
-
Constructor Summary
Constructors Constructor Description RowsToColumns()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.List<Aggregation>
getAggregations()
Get the aggregation(s) applied to each pivot group.List<String>
getGroupKeys()
Get the list of fields used to segment the input data into groups.RecordPort
getInput()
Returns the input portRecordPort
getOutput()
Returns the output portString
getPivotKey()
Get the name of the field used as the pivot key.List<String>
getPivotKeyValues()
Get the list of distinct values of the pivot key field.void
setAggregations(Aggregation... aggregations)
Set the aggregation(s) to apply to each pivot grouping.void
setAggregations(String aggregationExpression)
Set the aggregation(s) to apply to each pivot grouping using an aggregation expression.void
setAggregations(List<Aggregation> aggregations)
Set the aggregation(s) to apply to each pivot grouping.void
setGroupKeys(String... groupKeys)
Set the list of fields used to segment the input data into groups for pivoting.void
setGroupKeys(List<String> groupKeys)
Set the list of fields used to segment the input data into groups for pivoting.void
setPivotColumnPattern(String pattern)
Sets the naming pattern used for new pivot columns.void
setPivotKey(String pivotKey)
Set the name of the field used as the pivot key.void
setPivotKeyValues(String... pivotValues)
Set the list of distinct pivot key values.void
setPivotKeyValues(List<String> pivotValues)
Set the list of distinct pivot key values.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Field Detail
-
DEFAULTCOLUMNPATTERN
public static final String DEFAULTCOLUMNPATTERN
- See Also:
- Constant Field Values
-
-
Method Detail
-
getInput
public RecordPort getInput()
Description copied from interface:PipelineOperator
Returns the input port- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Returns:
- the input port
-
getOutput
public RecordPort getOutput()
Description copied from interface:PipelineOperator
Returns the output port- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Returns:
- the output port
-
getGroupKeys
public List<String> getGroupKeys()
Get the list of fields used to segment the input data into groups.- Returns:
- ordered list of grouping field names
-
setGroupKeys
public void setGroupKeys(List<String> groupKeys)
Set the list of fields used to segment the input data into groups for pivoting.- Parameters:
groupKeys
- ordered list of grouping field names
-
setGroupKeys
public void setGroupKeys(String... groupKeys)
Set the list of fields used to segment the input data into groups for pivoting.- Parameters:
groupKeys
- ordered list of grouping field names
-
getPivotKey
public String getPivotKey()
Get the name of the field used as the pivot key.- Returns:
- name of pivot key field
-
setPivotKey
public void setPivotKey(String pivotKey)
Set the name of the field used as the pivot key. This field is used as the pivot point. Distinct values from this field are used to determine the output fields of the pivot.- Parameters:
pivotKey
- name of the pivot key field
-
getAggregations
public List<Aggregation> getAggregations()
Get the aggregation(s) applied to each pivot group.- Returns:
- aggregations applied to pivot groups
-
setAggregations
public void setAggregations(List<Aggregation> aggregations)
Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations
- aggregation(s) to apply to each pivot data group
-
setAggregations
public void setAggregations(Aggregation... aggregations)
Set the aggregation(s) to apply to each pivot grouping. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations
- aggregation(s) to apply to each pivot data group
-
setAggregations
public void setAggregations(String aggregationExpression)
Set the aggregation(s) to apply to each pivot grouping using an aggregation expression. The aggregation(s) define the input field(s) that make up the pivot values. These values are aggregated per pivot group and the results placed in a field within the output. A set of output fields will be produced per aggregation defined.- Parameters:
aggregations
- aggregation(s) to apply to each pivot data group
-
getPivotKeyValues
public List<String> getPivotKeyValues()
Get the list of distinct values of the pivot key field.- Returns:
- distinct pivot key values
-
setPivotKeyValues
public void setPivotKeyValues(List<String> pivotValues)
Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.- Parameters:
pivotValues
- list of distinct pivot key values
-
setPivotKeyValues
public void setPivotKeyValues(String... pivotValues)
Set the list of distinct pivot key values. An output field will be created for each value per aggregation defined. If the input pivot key field contains values not present in this list, those input values will be ignored.- Parameters:
pivotValues
- list of distinct pivot key values
-
setPivotColumnPattern
public void setPivotColumnPattern(String pattern)
Sets the naming pattern used for new pivot columns. This will be used to determine the actual names of the pivot columns.- Parameters:
pattern
- name pattern for pivot columns
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-