java.lang.Object
com.pervasive.datarush.operators.AbstractLogicalOperator
com.pervasive.datarush.operators.CompositeOperator
com.pervasive.datarush.analytics.stats.DataQualityAnalyzer
- All Implemented Interfaces:
LogicalOperator
Evaluates a set of quality tests on an input dataset. Those rows
for which all tests pass are considered "clean" and thus sent to the
clean output. Those rows for which any tests fail
are considered "dirty" and thus sent to the dirty output.
In addition, this produces a summary model that includes the following statistics:
totalFrequency: total number of rowsinvalidFrequency: total number of rows for which at least one test involving the given field failedtestFailureCounts: per-test failure counts for each test involving the given field
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA quality test consists of a test name (used to reference the test in the statistics) plus a boolean predicate. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.getClean()Returns a port that will output the "clean" rows.getDirty()Returns a port that will output the "dirty" rows.getInput()Returns a port for the input dataset to be tested.getModel()Returns a port that will output aPMMLSummaryStatisticsModel.getTests()Returns the set of tests to apply to the input datasetvoidSets the set of tests to apply to the input dataset.voidSets the set of tests to apply to the input datasetMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Constructor Details
-
DataQualityAnalyzer
public DataQualityAnalyzer()Evaluates a set of quality tests on an input dataset. By default the set of tests is empty; prior to graph compilation the following property must be set:
-
-
Method Details
-
getInput
Returns a port for the input dataset to be tested.- Returns:
- a port for the input dataset to be tested.
-
getTests
Returns the set of tests to apply to the input dataset- Returns:
- the set of tests to apply to the input dataset
-
setTests
Sets the set of tests to apply to the input dataset- Parameters:
tests- the set of tests to apply to the input dataset
-
setTests
Sets the set of tests to apply to the input dataset. The tests are expressed using the field derivation expression language. The general format of the expression language is:{expression1} as {metric1}[, {expression2} as {metric2}, ...]. The expression themselves are predicate functions that return abooleanvalue.- Parameters:
expression- an expression that evaluates to a set of quality tests
-
getClean
Returns a port that will output the "clean" rows. A row is considered clean if all tests pass.- Returns:
- a port that will output the "clean" rows
-
getDirty
Returns a port that will output the "dirty" rows. A row is considered dirty if any tests fail.- Returns:
- a port that will output the "clean" rows
-
getModel
Returns a port that will output aPMMLSummaryStatisticsModel. The model will be populated with the following information:totalFrequency: total number of rowsinvalidFrequency: total number of rows for which at least one test involving the given field failedtestFailureCounts: per-test failure counts for each test involving the given field
- Returns:
- a port that will output a
PMMLSummaryStatisticsModel.
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-