Class AVLTreeDigest

    • Constructor Detail

      • AVLTreeDigest

        public AVLTreeDigest​(double compression)
        A histogram structure that will record a sketch of a distribution.
        Parameters:
        compression - How should accuracy be traded for size? A value of N here will give quantile errors almost always less than 3/N with considerably smaller errors expected for extreme quantiles. Conversely, you should expect to track about 5 N centroids for this accuracy.
    • Method Detail

      • recordAllData

        public TDigest recordAllData()
        Description copied from class: AbstractTDigest
        Sets up so that all centroids will record all data assigned to them. For testing only, really.
        Overrides:
        recordAllData in class AbstractTDigest
        Returns:
        This TDigest so that configurations can be done in fluent style.
      • add

        public void add​(double x,
                        int w)
        Description copied from class: TDigest
        Adds a sample to a histogram.
        Specified by:
        add in class TDigest
        Parameters:
        x - The value to add.
        w - The weight of this point.
      • add

        public void add​(double x,
                        int w,
                        List<Double> data)
      • compress

        public void compress()
        Description copied from class: TDigest
        Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.
        The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.
        This is a destructive operation that is not thread-safe.
        Specified by:
        compress in class TDigest
      • size

        public long size()
        Returns the number of samples represented in this histogram. If you want to know how many centroids are being used, try centroids().size().
        Specified by:
        size in class TDigest
        Returns:
        the number of samples that have been added.
      • cdf

        public double cdf​(double x)
        Description copied from class: TDigest
        Returns the fraction of all points added which are <= x.
        Specified by:
        cdf in class TDigest
        Parameters:
        x - the value at which the CDF should be evaluated
        Returns:
        the approximate fraction of all samples that were less than or equal to x.
      • quantile

        public double quantile​(double q)
        Description copied from class: TDigest
        Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
        Specified by:
        quantile in class TDigest
        Parameters:
        q - The quantile desired. Can be in the range [0,1].
        Returns:
        The minimum value x such that we think that the proportion of samples is <= x is q.
      • centroidCount

        public int centroidCount()
        Description copied from class: TDigest
        The number of centroids currently in the TDigest.
        Specified by:
        centroidCount in class TDigest
        Returns:
        The number of centroids
      • centroids

        public Iterable<? extends Centroid> centroids()
        Description copied from class: TDigest
        An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
        Specified by:
        centroids in class TDigest
        Returns:
        The centroids in the form of an Iterable.
      • compression

        public double compression()
        Description copied from class: TDigest
        Returns the current compression factor.
        Specified by:
        compression in class TDigest
        Returns:
        The compression factor originally used to set up the TDigest.
      • byteSize

        public int byteSize()
        Returns an upper bound on the number bytes that will be required to represent this histogram.
        Specified by:
        byteSize in class TDigest
        Returns:
        The number of bytes required.
      • smallByteSize

        public int smallByteSize()
        Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
        Specified by:
        smallByteSize in class TDigest
        Returns:
        The number of bytes required.
      • asBytes

        public void asBytes​(ByteBuffer buf)
        Outputs a histogram as bytes using a particularly cheesy encoding.
        Specified by:
        asBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • asSmallBytes

        public void asSmallBytes​(ByteBuffer buf)
        Description copied from class: TDigest
        Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
        Specified by:
        asSmallBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • fromBytes

        public static AVLTreeDigest fromBytes​(ByteBuffer buf)
        Reads a histogram from a byte buffer
        Returns:
        The new histogram structure