Interface LogicalOperator

All Known Subinterfaces:
MultiSinkOperator<T>, MultiSourceOperator<T>, PipelineOperator<T>, RecordPipelineOperator, RecordSinkOperator, RecordSourceOperator, SinkOperator<T>, SourceOperator<T>
All Known Implementing Classes:
AbstractDeferredRecordOperator, AbstractExecutableRecordPipeline, AbstractLogicalOperator, AbstractPredictor, AbstractReader, AbstractRecordCompositeOperator, AbstractRelationalJoin, AbstractTextReader, AbstractTextWriter, AbstractWriter, AbstractWriteToJDBC, AbstractWriteToJDBC.AbstractWriteToJDBCWorker, AnalyzeDuplicateKeys, AnalyzeLinkKeys, AssertEqual, AssertEqualHash, AssertEqualRecordType, AssertEqualTypes, AssertMetadata, AssertPredicate, AssertRowCount, AssertSorted, BinaryWriter, BlockCartesian, BlockRecords, BlockSelf, CalculateNGramFrequency, CalculateWordFrequency, ClusterDuplicates, ClusterLinks, ClusterPredictor, CollectRecords, ColumnsToRows, CompositeOperator, ConvertARMModel, ConvertTextCase, CountRanges, CountTokens, CrossJoin, DataQualityAnalyzer, DecisionTreeLearner, DecisionTreePredictor, DecisionTreePruner, DeferredCompositeOperator, DeleteFromJDBC, DeleteHBase, DeriveFields, DictionaryFilter, DiscoverDomain, DiscoverDuplicates, DiscoverLinks, DistinctValues, DrawDiagnosticsChart, DumpPartitions, EmitRecords, EqualRangeBinning, ErrorSink, ErrorSource, ExecutableOperator, ExpandTextFrequency, ExpandTextTokens, ExternalRecordSink, ExternalRecordSource, FilterExistingRows, FilterExistJoinProcess, FilterFields, FilterRows, FilterText, FinalizeSQLWorker, ForceRecordStaging, ForceStaging, FPGrowth, FrequentItems, GatherHint, GenerateArithmeticSequence, GenerateBagOfWords, GenerateConstant, GenerateRandom, GenerateRepeatingCycle, GetModel, GetPMML, Group, GroupPairsSortedRows, InitializeSQLWorker, IterativeOperator, JDBCOperator, Join, KeyOperator, KeyValueOperator, KMeans, KNNClassifier, LargeGroupDetector, LimitRows, LinearRegressionLearner, LoadActianVector, LogisticRegressionLearner, LogisticRegressionPredictor, LogRows, MergeFields, MergeModel, MockableExternalRecordSink, MockableExternalRecordSource, MostFrequentValues, NaiveBayesLearner, NaiveBayesPredictor, NormalizeValues, OpenComposite, OpenModelSink, OpenModelSource, OpenMultiModelSink, OpenMultiModelSource, OpenMultiRecordSink, OpenMultiRecordSource, OpenRecordSink, OpenRecordSource, ParseTextFields, PartitionHint, ProcessByGroup, PutModel, PutPMML, Randomize, Rank, ReadActianVector, ReadARFF, ReadAvro, ReadDelimitedText, ReadFixedText, ReadFromJDBC, ReadHBase, ReadJSON, ReadLog, ReadMDF, ReadORC, ReadParquet, ReadPMML, ReadSource, ReadStagingDataset, RegressionPredictor, RemapFields, RemoveDuplicates, RemoveFields, ReplaceMissingValues, RetainFields, RowsToColumns, RunJavaScript, RunRScript, RunScript, SampleRandomRows, SelectFields, SemiJoin, SimulatePartitions, Sort, SortedGroupHandler, SplitField, StreamingOperator, SubJobExecutor, SummaryStatistics, SumOfSquares, SVMLearner, SVMPredictor, TextFrequencyFilter, TextStemmer, TextTokenizer, UnionAll, UpdateInJDBC, WriteARFF, WriteAvro, WriteDelimitedText, WriteFixedText, WriteHBase, WriteORC, WritePMML, WriteSink, WriteStagingDataset, WriteToJDBC

public interface LogicalOperator
A logical operator is the fundamental unit of composition which may be added to a LogicalGraph.

Implementations must not extend this class directly; instead they must extend one of the following sub-classes:

  1. IterativeOperator
  2. IterativeOperator
  3. ExecutableOperator
  4. DeferredCompositeOperator

Conventions

Operators are expected to adhere to the "Java Bean" standard. All operators must declare their input and output ports as private final fields and must provide public "getters" to access those fields. In addition, operators will typically define several configuration properties. Each of those properties should be defined as non-final fields with public "getters" and "setters". Finally, operators should define a public no-arg constructor. For example:
    public class MyOperator extends ExecutableOperator {
       //create an input record port--should be declared as final
       private final RecordPort input= newRecordInput("input");
       //create an output record port--should be declared as final
       private final RecordPort output= newRecordOutput("output");
       
       //declare a config property with a default value, must be non-final
       private int myConfigProperty= 5;
       
       //required: default constructor: initializes everything to their defaults
       public MyOperator() {
       }
       
       //optional: one or more convenience constructors to specify values for configuration properties
       public MyOperator(int myConfigProperty) {
          setMyConfigProperty(myConfigProperty);
       }
       
       //required: public getter for each input port
       public RecordPort getInput() {
          return input;
       }
       
       //required: public getter for each output port
       public RecordPort getOutput() {
          return output;
       }
       
       //required: public getter for each configuration property
       public int getMyConfigProperty() {
          return myConfigProperty;
       }
       
       //required: public setter for each configuration property
       public void setMyConfigProperty(int myConfigProperty) {
          this.myConfigProperty= myConfigProperty;
       }
    }
 

Serialization

All logical operators must be JSON serializable. JSON is used as the serialization mechanism anytime we must serialize an operator. This includes serialization to the cluster for distribution and export from the UI. In addition, there are times when we must clone an ExecutableOperator; here we use JSON as the default implementation of clone.

By following the "Java Bean" standard, described in the previous section, JSON serialization/deserialization is mostly automatic:

  • Serialization proceeds by invoking each of the getters in-turn.
  • Deserialization proceeds by first creating a new object via the default constructor and then invoking each of the setters in-turn.
There are a few additional considerations:
  • When defining getters and setters, it is simplest to avoid overloading setters. But, if you do overload setters, you must add explicit @JsonIgnore tags to the setters to ignore, an explicit @JsonProperty tag to the single setter to use, and an explicit @JsonProperty tag to the single getter to use.
  • It is a best practice for built-in operators to define a shortcut name for the operator type name by declaring the @JsonTypeName annotation on the operator. Operators declared with shortcut type names must also be registered via the TypeResolutionProvider service.
  • Method Details

    • getInputPorts

      Namespace<LogicalPort> getInputPorts()
      Returns the list of input ports. This method is generally used by framework code; operator consumers should generally use the methods getInput(), etc to obtain a handle to input ports.
      Returns:
      the list of input ports
    • getOutputPorts

      Namespace<LogicalPort> getOutputPorts()
      Returns the list of output ports. This method is generally used by framework code; operator consumers should generally use the methods getOutput(), etc to obtain a handle to input ports.
      Returns:
      the list of output ports
    • disableParallelism

      void disableParallelism()
      Can be called to forcible disable parallelism for the given operator (and children if this is a CompositeOperator). This method should be used sparingly since it will degrade performance significantly; but is needed in certain cases. For example:
      1. If there is a RunScript operator that contains a non-parallelizable script
      2. If there is a DeriveFields operator that contains a non-parallelizable function