-
- All Known Subinterfaces:
MultiSinkOperator<T>
,MultiSourceOperator<T>
,PipelineOperator<T>
,RecordPipelineOperator
,RecordSinkOperator
,RecordSourceOperator
,SinkOperator<T>
,SourceOperator<T>
- All Known Implementing Classes:
AbstractDeferredRecordOperator
,AbstractExecutableRecordPipeline
,AbstractLogicalOperator
,AbstractPredictor
,AbstractReader
,AbstractRecordCompositeOperator
,AbstractRelationalJoin
,AbstractTextReader
,AbstractTextWriter
,AbstractWriter
,AbstractWriteToJDBC
,AbstractWriteToJDBC.AbstractWriteToJDBCWorker
,AnalyzeDuplicateKeys
,AnalyzeLinkKeys
,AssertEqual
,AssertEqualHash
,AssertEqualRecordType
,AssertEqualTypes
,AssertMetadata
,AssertPredicate
,AssertRowCount
,AssertSorted
,BinaryWriter
,BlockCartesian
,BlockRecords
,BlockSelf
,CalculateNGramFrequency
,CalculateWordFrequency
,ClusterDuplicates
,ClusterLinks
,ClusterPredictor
,CollectRecords
,ColumnsToRows
,CompositeOperator
,ConvertARMModel
,ConvertTextCase
,CountRanges
,CountTokens
,CrossJoin
,DataQualityAnalyzer
,DecisionTreeLearner
,DecisionTreePredictor
,DecisionTreePruner
,DeferredCompositeOperator
,DeleteFromJDBC
,DeleteHBase
,DeriveFields
,DictionaryFilter
,DiscoverDomain
,DiscoverDuplicates
,DiscoverLinks
,DistinctValues
,DrawDiagnosticsChart
,DumpPartitions
,EmitRecords
,ErrorSink
,ErrorSource
,ExecutableOperator
,ExpandTextFrequency
,ExpandTextTokens
,ExternalRecordSink
,ExternalRecordSource
,FilterExistingRows
,FilterExistJoinProcess
,FilterFields
,FilterRows
,FilterText
,FinalizeSQLWorker
,ForceRecordStaging
,ForceStaging
,FPGrowth
,FrequentItems
,GatherHint
,GenerateArithmeticSequence
,GenerateBagOfWords
,GenerateConstant
,GenerateRandom
,GenerateRepeatingCycle
,GetModel
,GetPMML
,Group
,GroupPairsSortedRows
,InitializeSQLWorker
,IterativeOperator
,JDBCOperator
,Join
,KeyOperator
,KeyValueOperator
,KMeans
,KNNClassifier
,LargeGroupDetector
,LimitRows
,LinearRegressionLearner
,LoadActianVector
,LoadVectorOnHadoop
,LoadVectorOnHadoopDirect
,LogisticRegressionLearner
,LogisticRegressionPredictor
,LogRows
,MergeFields
,MergeModel
,MostFrequentValues
,NaiveBayesLearner
,NaiveBayesPredictor
,NormalizeValues
,OpenComposite
,OpenModelSink
,OpenModelSource
,OpenMultiModelSink
,OpenMultiModelSource
,OpenMultiRecordSink
,OpenMultiRecordSource
,OpenRecordSink
,OpenRecordSource
,ParseTextFields
,PartitionHint
,ProcessByGroup
,PutModel
,PutPMML
,Randomize
,Rank
,ReadARFF
,ReadAvro
,ReadDelimitedText
,ReadFixedText
,ReadFromJDBC
,ReadHBase
,ReadJSON
,ReadLog
,ReadMDF
,ReadORC
,ReadParquet
,ReadPMML
,ReadSource
,ReadStagingDataset
,RegressionPredictor
,RemapFields
,RemoveDuplicates
,RemoveFields
,ReplaceMissingValues
,RetainFields
,RowsToColumns
,RunJavaScript
,RunRScript
,RunScript
,SampleRandomRows
,SelectFields
,SemiJoin
,SimulatePartitions
,Sort
,SortedGroupHandler
,SplitField
,StreamingOperator
,SummaryStatistics
,SumOfSquares
,SVMPredictor
,TextFrequencyFilter
,TextStemmer
,TextTokenizer
,UnionAll
,UpdateInJDBC
,WriteARFF
,WriteAvro
,WriteDelimitedText
,WriteFixedText
,WriteHBase
,WriteORC
,WritePMML
,WriteSink
,WriteStagingDataset
,WriteToJDBC
public interface LogicalOperator
A logical operator is the fundamental unit of composition which may be added to aLogicalGraph
.Implementations must not extend this class directly; instead they must extend one of the following sub-classes:
Conventions
Operators are expected to adhere to the "Java Bean" standard. All operators must declare their input and output ports as private final fields and must provide public "getters" to access those fields. In addition, operators will typically define several configuration properties. Each of those properties should be defined as non-final fields with public "getters" and "setters". Finally, operators should define a public no-arg constructor. For example:public class MyOperator extends
ExecutableOperator
{ //create an input record port--should be declared as final private finalRecordPort
input=newRecordInput
("input"); //create an output record port--should be declared as final private finalRecordPort
output=newRecordOutput
("output"); //declare a config property with a default value, must be non-final private int myConfigProperty= 5; //required: default constructor: initializes everything to their defaults public MyOperator() { } //optional: one or more convenience constructors to specify values for configuration properties public MyOperator(int myConfigProperty) { setMyConfigProperty(myConfigProperty); } //required: public getter for each input port publicRecordPort
getInput() { return input; } //required: public getter for each output port publicRecordPort
getOutput() { return output; } //required: public getter for each configuration property public int getMyConfigProperty() { return myConfigProperty; } //required: public setter for each configuration property public void setMyConfigProperty(int myConfigProperty) { this.myConfigProperty= myConfigProperty; } }Serialization
All logical operators must be JSON serializable. JSON is used as the serialization mechanism anytime we must serialize an operator. This includes serialization to the cluster for distribution and export from the UI. In addition, there are times when we must clone anExecutableOperator
; here we use JSON as the default implementation of clone.By following the "Java Bean" standard, described in the previous section, JSON serialization/deserialization is mostly automatic:
- Serialization proceeds by invoking each of the getters in-turn.
- Deserialization proceeds by first creating a new object via the default constructor and then invoking each of the setters in-turn.
- When defining getters and setters, it is simplest to avoid overloading setters. But, if
you do overload setters, you must add explicit
@JsonIgnore
tags to the setters to ignore, an explicit@JsonProperty
tag to the single setter to use, and an explicit@JsonProperty
tag to the single getter to use. - It is a best practice for built-in operators to define a shortcut name for the operator
type name by declaring the
@JsonTypeName
annotation on the operator. Operators declared with shortcut type names must also be registered via theTypeResolutionProvider
service.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
disableParallelism()
Can be called to forcible disable parallelism for the given operator (and children if this is aCompositeOperator
).Namespace<LogicalPort>
getInputPorts()
Returns the list of input ports.Namespace<LogicalPort>
getOutputPorts()
Returns the list of output ports.
-
-
-
Method Detail
-
getInputPorts
Namespace<LogicalPort> getInputPorts()
Returns the list of input ports. This method is generally used by framework code; operator consumers should generally use the methodsgetInput()
, etc to obtain a handle to input ports.- Returns:
- the list of input ports
-
getOutputPorts
Namespace<LogicalPort> getOutputPorts()
Returns the list of output ports. This method is generally used by framework code; operator consumers should generally use the methodsgetOutput()
, etc to obtain a handle to input ports.- Returns:
- the list of output ports
-
disableParallelism
void disableParallelism()
Can be called to forcible disable parallelism for the given operator (and children if this is aCompositeOperator
). This method should be used sparingly since it will degrade performance significantly; but is needed in certain cases. For example:- If there is a
RunScript
operator that contains a non-parallelizable script - If there is a
DeriveFields
operator that contains a non-parallelizable function
- If there is a
-
-