Class SummaryStatistics

    • Constructor Detail

      • SummaryStatistics

        public SummaryStatistics()
        Discover summary statistics. By default we discover singlePass statistics; configure detailLevel to provide more or less detail.
    • Method Detail

      • setDetailLevel

        public void setDetailLevel​(DetailLevel detailLevel)
        Sets the detail level that we use to compute statistics. The default value is DetailLevel.SINGLE_PASS_ONLY.
        Parameters:
        detailLevel - the detail level
      • getShowTopHowMany

        public int getShowTopHowMany()
        Provides a cap on the number of valueCounts to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Returns:
        the cap the number of valueCounts to calculate.
      • setShowTopHowMany

        public void setShowTopHowMany​(int showTopHowMany)
        Sets a cap on the number of valueCounts to calculate. The default is 25. Memory usage is proportional to the number of distinct values; thus only the top n values are calculated in order to avoid excessive memory consumption in the event that the number of distinct values for a given field is large. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Parameters:
        showTopHowMany - the cap the number of valueCounts to calculate.
      • getRangeCount

        public int getRangeCount()
        Returns the number of intervalCounts to calculate for each numeric field. The default value is 10. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Returns:
        the number of intervalCounts to calculate for each numeric field.
      • setRangeCount

        public void setRangeCount​(int rangeCount)
        Sets the number of intervalCounts to calculate for each numeric field. The default value is 10. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Parameters:
        rangeCount - the number of intervalCounts to calculate for each numeric field.
      • getQuantilesToCalculate

        public List<BigDecimal> getQuantilesToCalculate()
        Gets the quantiles to calculate for each numeric field. By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles). This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Returns:
        the quantiles to calculate for each numeric field.
      • setQuantilesToCalculate

        public void setQuantilesToCalculate​(List<BigDecimal> quantilesToCalculate)
        Sets the quantiles to calculate for each numeric field. By default this is 0.25, 0.50, and 0.75 (the 25th, 50th, and 75th percentiles). This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Parameters:
        quantilesToCalculate - the quantiles to calculate for each numeric field.
      • getIncludedFields

        public List<String> getIncludedFields()
        Gets the fields from the input dataset for which we are collecting statistics. The default value of "empty list" implies "all fields".
        Returns:
        the fields from the input dataset for which we are collecting statistics.
      • setIncludedFields

        public void setIncludedFields​(List<String> includedFields)
        Sets the fields from the input dataset for which we are collecting statistics. The default value of "empty list" implies "all fields".
        Parameters:
        includedFields - the fields from the input dataset for which we are collecting statistics.
      • isFewDistinctValuesHint

        public boolean isFewDistinctValuesHint()
        Returns a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of quantiles and frequent items. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Returns:
        whether few distinct values are expected
      • setFewDistinctValuesHint

        public void setFewDistinctValuesHint​(boolean fewDistinctValuesHint)
        Sets a hint as to whether there are expected to be a small number of distinct values. If not, we eagerly sort each column up-front and perform a parallelized computation of quantiles and frequent items. This setting is ignored if detail level is not DetailLevel.MULTI_PASS.
        Parameters:
        fewDistinctValuesHint - whether few distinct values are expected
      • compose

        protected void compose​(CompositionContext ctx)
        Description copied from class: CompositeOperator
        Compose the body of this operator. Implementations should do the following:
        1. Perform any validation of configuration, input types, etc
        2. Instantiate and configure sub-operators, adding them to the provided context via the method OperatorComposable.add(O)
        3. Create necessary connections via the method OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
        Specified by:
        compose in class CompositeOperator
        Parameters:
        ctx - the context