Class SweetSpotSimilarity


  • public class SweetSpotSimilarity
    extends ClassicSimilarity

    A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

    For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

    For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

    See Also:
    A Gnuplot file used to generate some of the visualizations referenced from each function.
    • Field Detail

      • ln_min

        private int ln_min
      • ln_max

        private int ln_max
      • ln_steep

        private float ln_steep
      • tf_base

        private float tf_base
      • tf_min

        private float tf_min
      • tf_hyper_min

        private float tf_hyper_min
      • tf_hyper_max

        private float tf_hyper_max
      • tf_hyper_base

        private double tf_hyper_base
      • tf_hyper_xoffset

        private float tf_hyper_xoffset
    • Constructor Detail

      • SweetSpotSimilarity

        public SweetSpotSimilarity()
    • Method Detail

      • setBaselineTfFactors

        public void setBaselineTfFactors​(float base,
                                         float min)
        Sets the baseline and minimum function variables for baselineTf
        See Also:
        baselineTf(float)
      • setHyperbolicTfFactors

        public void setHyperbolicTfFactors​(float min,
                                           float max,
                                           double base,
                                           float xoffset)
        Sets the function variables for the hyperbolicTf functions
        Parameters:
        min - the minimum tf value to ever be returned (default: 0.0)
        max - the maximum tf value to ever be returned (default: 2.0)
        base - the base value to be used in the exponential for the hyperbolic function (default: 1.3)
        xoffset - the midpoint of the hyperbolic function (default: 10.0)
        See Also:
        hyperbolicTf(float)
      • setLengthNormFactors

        public void setLengthNormFactors​(int min,
                                         int max,
                                         float steepness,
                                         boolean discountOverlaps)
        Sets the default function variables used by lengthNorm when no field specific variables have been set.
        See Also:
        lengthNorm(int)
      • tf

        public float tf​(float freq)
        Delegates to baselineTf
        Overrides:
        tf in class ClassicSimilarity
        Parameters:
        freq - the frequency of a term within a document
        Returns:
        a score factor based on a term's within-document frequency
        See Also:
        baselineTf(float)