Class ArrayDigest


  • public class ArrayDigest
    extends AbstractTDigest
    Array based implementation of a TDigest.
    This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.
    • Constructor Detail

      • ArrayDigest

        public ArrayDigest​(int pageSize,
                           double compression)
    • Method Detail

      • add

        public void add​(double x,
                        int w)
        Description copied from class: TDigest
        Adds a sample to a histogram.
        Specified by:
        add in class TDigest
        Parameters:
        x - The value to add.
        w - The weight of this point.
      • headSum

        public long headSum​(com.tdunning.math.stats.ArrayDigest.Index limit)
      • mean

        public double mean​(com.tdunning.math.stats.ArrayDigest.Index index)
      • count

        public int count​(com.tdunning.math.stats.ArrayDigest.Index index)
      • compress

        public void compress()
        Description copied from class: TDigest
        Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.
        The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.
        This is a destructive operation that is not thread-safe.
        Specified by:
        compress in class TDigest
      • size

        public long size()
        Description copied from class: TDigest
        Returns the number of points that have been added to this TDigest.
        Specified by:
        size in class TDigest
        Returns:
        The sum of the weights on all centroids.
      • cdf

        public double cdf​(double x)
        Description copied from class: TDigest
        Returns the fraction of all points added which are <= x.
        Specified by:
        cdf in class TDigest
      • quantile

        public double quantile​(double q)
        Description copied from class: TDigest
        Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
        Specified by:
        quantile in class TDigest
        Parameters:
        q - The desired fraction
        Returns:
        The value x such that cdf(x) == q
      • centroidCount

        public int centroidCount()
        Description copied from class: TDigest
        The number of centroids currently in the TDigest.
        Specified by:
        centroidCount in class TDigest
        Returns:
        The number of centroids
      • centroids

        public Iterable<? extends Centroid> centroids()
        Description copied from class: TDigest
        An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
        Specified by:
        centroids in class TDigest
        Returns:
        The centroids in the form of an Iterable.
      • allAfter

        public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allAfter​(double x)
      • floor

        public com.tdunning.math.stats.ArrayDigest.Index floor​(double x)
        Returns a cursor pointing to the first element <= x. Exposed only for testing.
        Parameters:
        x - The value used to find the cursor.
        Returns:
        The cursor.
      • ceiling

        public com.tdunning.math.stats.ArrayDigest.Index ceiling​(double x)
      • allBefore

        public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allBefore​(double x)
        Returns an iterator which will give each element <= to x in non-increasing order.
        Parameters:
        x - The upper bound of all returned elements
        Returns:
        An iterator that returns elements in non-increasing order.
      • increment

        public com.tdunning.math.stats.ArrayDigest.Index increment​(com.tdunning.math.stats.ArrayDigest.Index x,
                                                                   int delta)
      • compression

        public double compression()
        Description copied from class: TDigest
        Returns the current compression factor.
        Specified by:
        compression in class TDigest
        Returns:
        The compression factor originally used to set up the TDigest.
      • byteSize

        public int byteSize()
        Returns an upper bound on the number bytes that will be required to represent this histogram.
        Specified by:
        byteSize in class TDigest
        Returns:
        The number of bytes required.
      • smallByteSize

        public int smallByteSize()
        Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
        Specified by:
        smallByteSize in class TDigest
        Returns:
        The number of bytes required.
      • asBytes

        public void asBytes​(ByteBuffer buf)
        Outputs a histogram as bytes using a particularly cheesy encoding.
        Specified by:
        asBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • asSmallBytes

        public void asSmallBytes​(ByteBuffer buf)
        Description copied from class: TDigest
        Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
        Specified by:
        asSmallBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • fromBytes

        public static ArrayDigest fromBytes​(ByteBuffer buf)
        Reads a histogram from a byte buffer
        Returns:
        The new histogram structure