Class EqualRangeBinning

All Implemented Interfaces:
LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

public class EqualRangeBinning extends AbstractRecordCompositeOperator
The EqualRangeBinning operator can be used to divide a set of numeric data into equal range bins. The upper and lower bounds can be specified, or alternatively the operator can determine appropriate values based on the minimum and maximum values discovered in the data during runtime. Any null values or values outside of the inclusive range set by the bounds will be considered an outlier and can be filtered from the data or included as bin 0. Additionally the range of each bin can be included in the output.
  • Constructor Details

    • EqualRangeBinning

      public EqualRangeBinning()
      Default Constructor. The fieldName and binCount properties are required and must be set when using this operator.
    • EqualRangeBinning

      public EqualRangeBinning(String fieldName, int binCount)
  • Method Details

    • getFieldName

      public String getFieldName()
      Get the name of the field to which the operation is applied.
      Returns:
      the field which will be equally binned
    • setFieldName

      public void setFieldName(String fieldName)
      Set the name of the field to which the operation is applied. Must be a numeric field.
      Parameters:
      fieldName - of the field which will be equally binned
    • getBinCount

      public int getBinCount()
      Get the number of equal range bins which will be used.
      Returns:
      the number of bins
    • setBinCount

      public void setBinCount(int binCount)
      Set the number of equal range bins which will be used.
      Parameters:
      binCount - of the equally ranged bins
    • getLowerBound

      public BigDecimal getLowerBound()
      Get the lowest bound on values that will be binned.
      Returns:
      the lower bound used by the bins
    • setLowerBound

      public void setLowerBound(long lowerBound)
      Set the lowest bound on all values that will be binned as a long. If not set the minimum value in the field will be used.
      Parameters:
      lowerBound - on the binned values
    • setLowerBound

      public void setLowerBound(double lowerBound)
      Set the lowest bound on all values that will be binned as a double. If not set the minimum value in the field will be used.
      Parameters:
      lowerBound - on the binned values
    • setLowerBound

      public void setLowerBound(BigDecimal lowerBound)
      Set the lowest bound on all values that will be binned as a BigDecimal. If not set the minimum value in the field will be used.
      Parameters:
      lowerBound - on the binned values
    • getUpperBound

      public BigDecimal getUpperBound()
      Get the highest bound on values that will binned.
      Returns:
      the upper bound used by the bins
    • setUpperBound

      public void setUpperBound(long upperBound)
      Set the highest bound on all values that will be binned as a long. If not set the maximum value in the field will be used.
      Parameters:
      upperBound - on the binned values
    • setUpperBound

      public void setUpperBound(double upperBound)
      Set the highest bound on all values that will be binned as a double. If not set the maximum value in the field will be used.
      Parameters:
      upperBound - on the binned values
    • setUpperBound

      public void setUpperBound(BigDecimal upperBound)
      Set the highest bound on all values that will be binned as a BigDecimal. If not set the maximum value in the field will be used.
      Parameters:
      upperBound - on the binned values
    • getIncludeOutliers

      public boolean getIncludeOutliers()
      Get whether outliers and null values are included in the output in bin 0. If not included they will be filtered from the output.
      Returns:
      whether outlier values are included
    • setIncludeOutliers

      public void setIncludeOutliers(boolean includeOutliers)
      Set whether outliers and null values are included in the output in bin 0. If not included they will be filtered from the output. Defaults to true.
      Parameters:
      includeOutliers - in the output or filter them
    • getIncludeRanges

      public boolean getIncludeRanges()
      Get whether ranges will be included in the output. If included will add two additional columns to the output which contain the lower bound and upper bound of the bin the values fall within.
      Returns:
      whether range values are included
    • setIncludeRanges

      public void setIncludeRanges(boolean includeRanges)
      Set whether ranges will be included in the output. If included will add two additional columns to the output which contain the lower bound and upper bound of the bin the values fall within. Defaults to false.
      Parameters:
      includeRanges - in the output
    • compose

      protected void compose(CompositionContext ctx)
      Description copied from class: CompositeOperator
      Compose the body of this operator. Implementations should do the following:
      1. Perform any validation of configuration, input types, etc
      2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
      3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
      Specified by:
      compose in class CompositeOperator
      Parameters:
      ctx - the context