Class EqualRangeBinning

  • All Implemented Interfaces:
    LogicalOperator, PipelineOperator<RecordPort>, RecordPipelineOperator

    public class EqualRangeBinning
    extends AbstractRecordCompositeOperator
    The EqualRangeBinning operator can be used to divide a set of numeric data into equal range bins. The upper and lower bounds can be specified, or alternatively the operator can determine appropriate values based on the minimum and maximum values discovered in the data during runtime. Any null values or values outside of the inclusive range set by the bounds will be considered an outlier and can be filtered from the data or included as bin 0. Additionally the range of each bin can be included in the output.
    • Constructor Detail

      • EqualRangeBinning

        public EqualRangeBinning()
        Default Constructor. The fieldName and binCount properties are required and must be set when using this operator.
      • EqualRangeBinning

        public EqualRangeBinning​(String fieldName,
                                 int binCount)
    • Method Detail

      • getFieldName

        public String getFieldName()
        Get the name of the field to which the operation is applied.
        Returns:
        the field which will be equally binned
      • setFieldName

        public void setFieldName​(String fieldName)
        Set the name of the field to which the operation is applied. Must be a numeric field.
        Parameters:
        fieldName - of the field which will be equally binned
      • getBinCount

        public int getBinCount()
        Get the number of equal range bins which will be used.
        Returns:
        the number of bins
      • setBinCount

        public void setBinCount​(int binCount)
        Set the number of equal range bins which will be used.
        Parameters:
        binCount - of the equally ranged bins
      • getLowerBound

        public BigDecimal getLowerBound()
        Get the lowest bound on values that will be binned.
        Returns:
        the lower bound used by the bins
      • setLowerBound

        public void setLowerBound​(long lowerBound)
        Set the lowest bound on all values that will be binned as a long. If not set the minimum value in the field will be used.
        Parameters:
        lowerBound - on the binned values
      • setLowerBound

        public void setLowerBound​(double lowerBound)
        Set the lowest bound on all values that will be binned as a double. If not set the minimum value in the field will be used.
        Parameters:
        lowerBound - on the binned values
      • setLowerBound

        public void setLowerBound​(BigDecimal lowerBound)
        Set the lowest bound on all values that will be binned as a BigDecimal. If not set the minimum value in the field will be used.
        Parameters:
        lowerBound - on the binned values
      • getUpperBound

        public BigDecimal getUpperBound()
        Get the highest bound on values that will binned.
        Returns:
        the upper bound used by the bins
      • setUpperBound

        public void setUpperBound​(long upperBound)
        Set the highest bound on all values that will be binned as a long. If not set the maximum value in the field will be used.
        Parameters:
        upperBound - on the binned values
      • setUpperBound

        public void setUpperBound​(double upperBound)
        Set the highest bound on all values that will be binned as a double. If not set the maximum value in the field will be used.
        Parameters:
        upperBound - on the binned values
      • setUpperBound

        public void setUpperBound​(BigDecimal upperBound)
        Set the highest bound on all values that will be binned as a BigDecimal. If not set the maximum value in the field will be used.
        Parameters:
        upperBound - on the binned values
      • getIncludeOutliers

        public boolean getIncludeOutliers()
        Get whether outliers and null values are included in the output in bin 0. If not included they will be filtered from the output.
        Returns:
        whether outlier values are included
      • setIncludeOutliers

        public void setIncludeOutliers​(boolean includeOutliers)
        Set whether outliers and null values are included in the output in bin 0. If not included they will be filtered from the output. Defaults to true.
        Parameters:
        includeOutliers - in the output or filter them
      • getIncludeRanges

        public boolean getIncludeRanges()
        Get whether ranges will be included in the output. If included will add two additional columns to the output which contain the lower bound and upper bound of the bin the values fall within.
        Returns:
        whether range values are included
      • setIncludeRanges

        public void setIncludeRanges​(boolean includeRanges)
        Set whether ranges will be included in the output. If included will add two additional columns to the output which contain the lower bound and upper bound of the bin the values fall within. Defaults to false.
        Parameters:
        includeRanges - in the output
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context