- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.group.RemoveDuplicates
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class RemoveDuplicates extends CompositeOperator implements RecordPipelineOperator
Removes duplicate rows based on a specified set of group keys. The "first" record of a key value group is pushed to the output. Other records with the same key values are ignored. The "first" record of a key group is determined by sorting all rows of each key group by the specifiedsortKeys
. IfsortKeys
is unspecified, then this will output an arbitrary row.
-
-
Constructor Summary
Constructors Constructor Description RemoveDuplicates()
Default constructor.RemoveDuplicates(List<String> groupKeys)
Remove duplicates, specifying keys.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.String[]
getGroupKeys()
Returns the keys by which to de-duplicate.RecordPort
getInput()
Returns the data to be de-duplicatedRecordPort
getOutput()
Returns the de-duplicated dataSortKey[]
getSortKeys()
Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate.void
setGroupKeys(String[] groupKeys)
Sets the keys by which to de-duplicate.void
setSortKeys(SortKey[] sortKeys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.void
setSortKeys(String... keys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Method Detail
-
getInput
public RecordPort getInput()
Returns the data to be de-duplicated- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Returns:
- the data to be de-duplicated
-
getOutput
public RecordPort getOutput()
Returns the de-duplicated data- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Returns:
- the de-duplicated data
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
getGroupKeys
public String[] getGroupKeys()
Returns the keys by which to de-duplicate.- Returns:
- the keys by which to de-duplicate.
-
setGroupKeys
public void setGroupKeys(String[] groupKeys)
Sets the keys by which to de-duplicate.- Parameters:
groupKeys
- the keys by which to de-duplicate.
-
getSortKeys
public SortKey[] getSortKeys()
Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Returns:
- the additional keys by which to sort the data
-
setSortKeys
public void setSortKeys(SortKey[] sortKeys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
sortKeys
- the additional keys by which to sort the data
-
setSortKeys
public void setSortKeys(String... keys)
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
keys
- the additional keys by which to sort the data
-
-