- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.group.RemoveDuplicates
-
- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
public class RemoveDuplicates extends CompositeOperator implements RecordPipelineOperator
Removes duplicate rows based on a specified set of group keys. The "first" record of a key value group is pushed to the output. Other records with the same key values are ignored. The "first" record of a key group is determined by sorting all rows of each key group by the specifiedsortKeys. IfsortKeysis unspecified, then this will output an arbitrary row.
-
-
Constructor Summary
Constructors Constructor Description RemoveDuplicates()Default constructor.RemoveDuplicates(List<String> groupKeys)Remove duplicates, specifying keys.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcompose(CompositionContext ctx)Compose the body of this operator.String[]getGroupKeys()Returns the keys by which to de-duplicate.RecordPortgetInput()Returns the data to be de-duplicatedRecordPortgetOutput()Returns the de-duplicated dataSortKey[]getSortKeys()Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate.voidsetGroupKeys(String[] groupKeys)Sets the keys by which to de-duplicate.voidsetSortKeys(SortKey[] sortKeys)Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.voidsetSortKeys(String... keys)Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Method Detail
-
getInput
public RecordPort getInput()
Returns the data to be de-duplicated- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Returns:
- the data to be de-duplicated
-
getOutput
public RecordPort getOutput()
Returns the de-duplicated data- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Returns:
- the de-duplicated data
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-
getGroupKeys
public String[] getGroupKeys()
Returns the keys by which to de-duplicate.- Returns:
- the keys by which to de-duplicate.
-
setGroupKeys
public void setGroupKeys(String[] groupKeys)
Sets the keys by which to de-duplicate.- Parameters:
groupKeys- the keys by which to de-duplicate.
-
getSortKeys
public SortKey[] getSortKeys()
Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Returns:
- the additional keys by which to sort the data
-
setSortKeys
public void setSortKeys(SortKey[] sortKeys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
sortKeys- the additional keys by which to sort the data
-
setSortKeys
public void setSortKeys(String... keys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
keys- the additional keys by which to sort the data
-
-