public final class DataQualityAnalyzer extends CompositeOperator
clean
output. Those rows for which any tests fail
are considered "dirty" and thus sent to the dirty
output.
In addition, this produces a summary model that includes the following statistics:
totalFrequency
: total number of rowsinvalidFrequency
: total number of rows for which at least one test involving the given field failedtestFailureCounts
: per-test failure counts for each test involving the given fieldModifier and Type | Class and Description |
---|---|
static class |
DataQualityAnalyzer.QualityTest
A quality test consists of a test name (used to reference the test in the statistics)
plus a boolean predicate.
|
Constructor and Description |
---|
DataQualityAnalyzer()
Evaluates a set of quality tests on an input dataset.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
RecordPort |
getClean()
Returns a port that will output the "clean" rows.
|
RecordPort |
getDirty()
Returns a port that will output the "dirty" rows.
|
RecordPort |
getInput()
Returns a port for the input dataset to be tested.
|
PMMLPort |
getModel()
Returns a port that will output a
PMMLSummaryStatisticsModel . |
List<DataQualityAnalyzer.QualityTest> |
getTests()
Returns the set of tests to apply to the input dataset
|
void |
setTests(List<DataQualityAnalyzer.QualityTest> tests)
Sets the set of tests to apply to the input dataset
|
void |
setTests(String expression)
Sets the set of tests to apply to the input dataset.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
public DataQualityAnalyzer()
public RecordPort getInput()
public List<DataQualityAnalyzer.QualityTest> getTests()
public void setTests(List<DataQualityAnalyzer.QualityTest> tests)
tests
- the set of tests to apply to the input datasetpublic void setTests(String expression)
{expression1} as {metric1}[, {expression2} as {metric2}, ...]
.
The expression themselves are predicate functions that return a
boolean
value.expression
- an expression that evaluates to a set of quality testspublic RecordPort getClean()
public RecordPort getDirty()
public PMMLPort getModel()
PMMLSummaryStatisticsModel
.
The model will be populated with the following information:
totalFrequency
: total number of rowsinvalidFrequency
: total number of rows for which at least one test involving the given field failedtestFailureCounts
: per-test failure counts for each test involving the given fieldPMMLSummaryStatisticsModel
.protected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2016 Actian Corporation. All rights reserved.