public final class StringMetrics extends Object
Consists of well known metrics and methods to create string metrics from
list- or set metrics. All metrics are setup with sensible defaults, to
customize metrics use StringMetricBuilder
.
The available metrics are:
All methods return immutable objects provided the arguments are also immutable.
Modifier and Type | Method and Description |
---|---|
static StringMetric |
blockDistance()
Returns a string metric that uses a
Tokenizers.whitespace() and
the BlockDistance metric. |
static float[] |
compare(StringMetric metric,
String c,
List<String> strings)
Deprecated.
trivial with no clear use case
|
static float[] |
compare(StringMetric metric,
String c,
String... strings)
Deprecated.
trivial with no clear use case
|
static float[] |
compareArrays(StringMetric metric,
String[] a,
String[] b)
Deprecated.
trivial with no clear use case
|
static StringMetric |
cosineSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the CosineSimilarity metric. |
static StringMetric |
create(Metric<String> metric)
Either constructs a new string metric or returns the original metric.
|
static StringMetric |
create(Metric<String> metric,
Simplifier simplifier)
Constructs a new composite string metric.
|
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Creates a new composite string metric.The tokenizer is used to tokenize
the simplified strings.
|
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Tokenizer tokenizer)
Creates a new composite string metric.
|
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Creates a new composite string metric.The tokenizer is used to tokenize
the simplified strings.
|
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Tokenizer tokenizer)
Creates a new composite string metric.
|
static StringMetric |
damerauLevenshtein()
Returns a string metric that uses a
DamerauLevenshtein metric. |
static StringMetric |
diceSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the DiceSimilarity metric. |
static StringMetric |
euclideanDistance()
Returns a string metric that uses a
Tokenizers.whitespace() and
the EuclideanDistance metric. |
static StringMetric |
identity()
Returns an string metric that uses the
Identity metric. |
static StringMetric |
jaccardSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the JaccardSimilarity metric. |
static StringMetric |
jaro()
Returns a string metric that uses the
Jaro metric. |
static StringMetric |
jaroWinkler()
Returns a string metric that uses the
JaroWinkler metric. |
static StringMetric |
levenshtein()
Returns a string metric that uses the
Levenshtein metric. |
static StringMetric |
matchingCoefficient()
Returns a string metric that uses a
Tokenizers.whitespace() and
the MatchingCoefficient metric. |
static StringMetric |
mongeElkan()
Returns a string metric that uses a
Tokenizers.whitespace() and
the MongeElkan metric with an internal SmithWatermanGotoh
metric. |
static StringMetric |
needlemanWunch()
Returns a string metric that uses the
NeedlemanWunch metric. |
static StringMetric |
overlapCoefficient()
Returns a string metric that uses a
Tokenizers.whitespace() and
the OverlapCoefficient metric. |
static StringMetric |
qGramsDistance()
Returns a string metric that uses a
Tokenizers.qGramWithPadding(int) for q=3 and the
BlockDistance metric. |
static StringMetric |
simonWhite()
Returns a string metric that uses a
Tokenizers.whitespace()
followed by a Tokenizers.qGramWithPadding(int) for q=2
and the SimonWhite metric. |
static StringMetric |
smithWaterman()
Returns a string metric that uses the
SmithWaterman metric. |
static StringMetric |
smithWatermanGotoh()
Returns a string metric that uses the
SmithWatermanGotoh metric. |
static StringMetric |
soundex()
Returns a string metric that uses a
Soundex and
JaroWinkler metric. |
@Deprecated public static float[] compare(StringMetric metric, String c, List<String> strings)
metric
- to compare c with each each value in the listc
- string to compare the list againststrings
- to compare c against@Deprecated public static float[] compare(StringMetric metric, String c, String... strings)
metric
- to compare c with each each value in the listc
- string to compare the list againststrings
- to compare c against@Deprecated public static float[] compareArrays(StringMetric metric, String[] a, String[] b)
metric
- to compare each element in a and ba
- array of string to compareb
- array of string to compareIllegalArgumentException
- when a and b are of a different lengthpublic static StringMetric cosineSimilarity()
Tokenizers.whitespace()
and
the CosineSimilarity
metric.public static StringMetric create(Metric<String> metric)
metric
- a metric for stringspublic static StringMetric create(Metric<String> metric, Simplifier simplifier)
metric
- a list metricsimplifier
- a simplifierNullPointerException
- when either metric or simplifier are nullStringMetricBuilder
public static StringMetric createForListMetric(Metric<List<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
metric
- a list metricsimplifier
- a simplifiertokenizer
- a tokenizerNullPointerException
- when either metric, simplifier or tokenizer are nullStringMetricBuilder
public static StringMetric createForListMetric(Metric<List<String>> metric, Tokenizer tokenizer)
metric
- a list metrictokenizer
- a tokenizerNullPointerException
- when either metric or tokenizer are nullStringMetricBuilder
public static StringMetric createForSetMetric(Metric<Set<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
metric
- a list metricsimplifier
- a simplifiertokenizer
- a tokenizerNullPointerException
- when either metric, simplifier or tokenizer are nullStringMetricBuilder
public static StringMetric createForSetMetric(Metric<Set<String>> metric, Tokenizer tokenizer)
metric
- a set metrictokenizer
- a tokenizerNullPointerException
- when either metric or tokenizer are nullStringMetricBuilder
public static StringMetric blockDistance()
Tokenizers.whitespace()
and
the BlockDistance
metric.public static StringMetric damerauLevenshtein()
DamerauLevenshtein
metric.public static StringMetric diceSimilarity()
Tokenizers.whitespace()
and
the DiceSimilarity
metric.public static StringMetric euclideanDistance()
Tokenizers.whitespace()
and
the EuclideanDistance
metric.public static StringMetric identity()
Identity
metric.public static StringMetric jaccardSimilarity()
Tokenizers.whitespace()
and
the JaccardSimilarity
metric.public static StringMetric jaro()
Jaro
metric.public static StringMetric jaroWinkler()
JaroWinkler
metric.public static StringMetric levenshtein()
Levenshtein
metric.public static StringMetric matchingCoefficient()
Tokenizers.whitespace()
and
the MatchingCoefficient
metric.public static StringMetric mongeElkan()
Tokenizers.whitespace()
and
the MongeElkan
metric with an internal SmithWatermanGotoh
metric.public static StringMetric needlemanWunch()
NeedlemanWunch
metric.public static StringMetric overlapCoefficient()
Tokenizers.whitespace()
and
the OverlapCoefficient
metric.public static StringMetric qGramsDistance()
Tokenizers.qGramWithPadding(int)
for q=3
and the
BlockDistance
metric.public static StringMetric simonWhite()
Tokenizers.whitespace()
followed by a Tokenizers.qGramWithPadding(int)
for q=2
and the SimonWhite
metric.public static StringMetric smithWaterman()
SmithWaterman
metric.public static StringMetric smithWatermanGotoh()
SmithWatermanGotoh
metric.public static StringMetric soundex()
Soundex
and
JaroWinkler
metric.Copyright © 2014–2018. All rights reserved.