- All Known Subinterfaces:
MultiSinkOperator<T>,MultiSourceOperator<T>,PipelineOperator<T>,RecordPipelineOperator,RecordSinkOperator,RecordSourceOperator,SinkOperator<T>,SourceOperator<T>
- All Known Implementing Classes:
AbstractDeferredRecordOperator,AbstractExecutableRecordPipeline,AbstractLogicalOperator,AbstractPredictor,AbstractReader,AbstractRecordCompositeOperator,AbstractRelationalJoin,AbstractTextReader,AbstractTextWriter,AbstractWriter,AbstractWriteToJDBC,AbstractWriteToJDBC.AbstractWriteToJDBCWorker,AnalyzeDuplicateKeys,AnalyzeLinkKeys,AssertEqual,AssertEqualHash,AssertEqualRecordType,AssertEqualTypes,AssertMetadata,AssertPredicate,AssertRowCount,AssertSorted,BinaryWriter,BlockCartesian,BlockRecords,BlockSelf,CalculateNGramFrequency,CalculateWordFrequency,ClusterDuplicates,ClusterLinks,ClusterPredictor,CollectRecords,ColumnsToRows,CompositeOperator,ConvertARMModel,ConvertTextCase,CountRanges,CountTokens,CrossJoin,DataQualityAnalyzer,DecisionTreeLearner,DecisionTreePredictor,DecisionTreePruner,DeferredCompositeOperator,DeleteFromJDBC,DeleteHBase,DeriveFields,DictionaryFilter,DiscoverDomain,DiscoverDuplicates,DiscoverLinks,DistinctValues,DrawDiagnosticsChart,DumpPartitions,EmitRecords,EqualRangeBinning,ErrorSink,ErrorSource,ExecutableOperator,ExpandTextFrequency,ExpandTextTokens,ExternalRecordSink,ExternalRecordSource,FilterExistingRows,FilterExistJoinProcess,FilterFields,FilterRows,FilterText,FinalizeSQLWorker,ForceRecordStaging,ForceStaging,FPGrowth,FrequentItems,GatherHint,GenerateArithmeticSequence,GenerateBagOfWords,GenerateConstant,GenerateRandom,GenerateRepeatingCycle,GetModel,GetPMML,Group,GroupPairsSortedRows,InitializeSQLWorker,IterativeOperator,JDBCOperator,Join,KeyOperator,KeyValueOperator,KMeans,KNNClassifier,LargeGroupDetector,LimitRows,LinearRegressionLearner,LoadActianVector,LogisticRegressionLearner,LogisticRegressionPredictor,LogRows,MergeFields,MergeModel,MockableExternalRecordSink,MockableExternalRecordSource,MostFrequentValues,NaiveBayesLearner,NaiveBayesPredictor,NormalizeValues,OpenComposite,OpenModelSink,OpenModelSource,OpenMultiModelSink,OpenMultiModelSource,OpenMultiRecordSink,OpenMultiRecordSource,OpenRecordSink,OpenRecordSource,ParseTextFields,PartitionHint,ProcessByGroup,PutModel,PutPMML,Randomize,Rank,ReadActianVector,ReadARFF,ReadAvro,ReadDelimitedText,ReadFixedText,ReadFromJDBC,ReadHBase,ReadJSON,ReadLog,ReadMDF,ReadORC,ReadParquet,ReadPMML,ReadSource,ReadStagingDataset,RegressionPredictor,RemapFields,RemoveDuplicates,RemoveFields,ReplaceMissingValues,RetainFields,RowsToColumns,RunJavaScript,RunRScript,RunScript,SampleRandomRows,SelectFields,SemiJoin,SimulatePartitions,Sort,SortedGroupHandler,SplitField,StreamingOperator,SubJobExecutor,SummaryStatistics,SumOfSquares,SVMLearner,SVMPredictor,TextFrequencyFilter,TextStemmer,TextTokenizer,UnionAll,UpdateInJDBC,WriteARFF,WriteAvro,WriteDelimitedText,WriteFixedText,WriteHBase,WriteORC,WritePMML,WriteSink,WriteStagingDataset,WriteToJDBC
public interface LogicalOperator
A logical operator is the fundamental unit of composition which may be added
to a
LogicalGraph.
Implementations must not extend this class directly; instead they must extend one of the following sub-classes:
Conventions
Operators are expected to adhere to the "Java Bean" standard. All operators must declare their input and output ports as private final fields and must provide public "getters" to access those fields. In addition, operators will typically define several configuration properties. Each of those properties should be defined as non-final fields with public "getters" and "setters". Finally, operators should define a public no-arg constructor. For example:
public class MyOperator extends ExecutableOperator {
//create an input record port--should be declared as final
private final RecordPort input= newRecordInput("input");
//create an output record port--should be declared as final
private final RecordPort output= newRecordOutput("output");
//declare a config property with a default value, must be non-final
private int myConfigProperty= 5;
//required: default constructor: initializes everything to their defaults
public MyOperator() {
}
//optional: one or more convenience constructors to specify values for configuration properties
public MyOperator(int myConfigProperty) {
setMyConfigProperty(myConfigProperty);
}
//required: public getter for each input port
public RecordPort getInput() {
return input;
}
//required: public getter for each output port
public RecordPort getOutput() {
return output;
}
//required: public getter for each configuration property
public int getMyConfigProperty() {
return myConfigProperty;
}
//required: public setter for each configuration property
public void setMyConfigProperty(int myConfigProperty) {
this.myConfigProperty= myConfigProperty;
}
}
Serialization
All logical operators must be JSON serializable. JSON is used as the serialization mechanism anytime we must serialize an operator. This includes serialization to the cluster for distribution and export from the UI. In addition, there are times when we must clone anExecutableOperator; here we use JSON as the default implementation
of clone.
By following the "Java Bean" standard, described in the previous section, JSON serialization/deserialization is mostly automatic:
- Serialization proceeds by invoking each of the getters in-turn.
- Deserialization proceeds by first creating a new object via the default constructor and then invoking each of the setters in-turn.
- When defining getters and setters, it is simplest to avoid overloading setters. But, if
you do overload setters, you must add explicit
@JsonIgnoretags to the setters to ignore, an explicit@JsonPropertytag to the single setter to use, and an explicit@JsonPropertytag to the single getter to use. - It is a best practice for built-in operators to define a shortcut name for the operator
type name by declaring the
@JsonTypeNameannotation on the operator. Operators declared with shortcut type names must also be registered via theTypeResolutionProviderservice.
-
Method Summary
Modifier and TypeMethodDescriptionvoidCan be called to forcible disable parallelism for the given operator (and children if this is aCompositeOperator).Returns the list of input ports.Returns the list of output ports.
-
Method Details
-
getInputPorts
Namespace<LogicalPort> getInputPorts()Returns the list of input ports. This method is generally used by framework code; operator consumers should generally use the methodsgetInput(), etc to obtain a handle to input ports.- Returns:
- the list of input ports
-
getOutputPorts
Namespace<LogicalPort> getOutputPorts()Returns the list of output ports. This method is generally used by framework code; operator consumers should generally use the methodsgetOutput(), etc to obtain a handle to input ports.- Returns:
- the list of output ports
-
disableParallelism
void disableParallelism()Can be called to forcible disable parallelism for the given operator (and children if this is aCompositeOperator). This method should be used sparingly since it will degrade performance significantly; but is needed in certain cases. For example:- If there is a
RunScriptoperator that contains a non-parallelizable script - If there is a
DeriveFieldsoperator that contains a non-parallelizable function
- If there is a
-