java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.CompositeOperator
com.pervasive.datarush.operators.group.RemoveDuplicates
- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
Removes duplicate rows based on a specified set of group keys.
The "first" record of a key value group is pushed to the output.
Other records with the same key values are ignored.
The "first" record of a key group is determined by sorting
all rows of each key group by the specified
sortKeys.
If sortKeys is unspecified, then this will output an arbitrary row.-
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor.RemoveDuplicates(List<String> groupKeys) Remove duplicates, specifying keys. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.String[]Returns the keys by which to de-duplicate.getInput()Returns the data to be de-duplicatedReturns the de-duplicated dataSortKey[]Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate.voidsetGroupKeys(String[] groupKeys) Sets the keys by which to de-duplicate.voidsetSortKeys(SortKey[] sortKeys) Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.voidsetSortKeys(String... keys) Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate.Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Constructor Details
-
RemoveDuplicates
public RemoveDuplicates()Default constructor. Prior to graph compilation the following properties must be set: -
RemoveDuplicates
Remove duplicates, specifying keys.- Parameters:
groupKeys- the names of the key fields. Must not be empty or null.
-
-
Method Details
-
getInput
Returns the data to be de-duplicated- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Returns:
- the data to be de-duplicated
-
getOutput
Returns the de-duplicated data- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Returns:
- the de-duplicated data
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-
getGroupKeys
Returns the keys by which to de-duplicate.- Returns:
- the keys by which to de-duplicate.
-
setGroupKeys
Sets the keys by which to de-duplicate.- Parameters:
groupKeys- the keys by which to de-duplicate.
-
getSortKeys
Returns the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Returns:
- the additional keys by which to sort the data
-
setSortKeys
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
sortKeys- the additional keys by which to sort the data
-
setSortKeys
Sets the additional keys by which to sort the data to determine which row to output in the event of a duplicate. This is an optional property; if left unspecified, an arbitrary row will be output.- Parameters:
keys- the additional keys by which to sort the data
-