T
- type of the tokenpublic class SimonWhite<T> extends Object implements ListMetric<T>
similarity(a,b) = 2 * |(a A b)| / (|a| + |b|)
The A operation takes the list intersection of a
and
b
. This is a list c
such that each element in has a
1-to-1 relation to an element in both a
and b
. E.g.
the list intersection of [ab,ab,ab,ac]
and
[ab,ab,ad]
is [ab,ab]
.
This metric is very similar to Dice's coefficient however Simon White used the list intersection rather then the set intersection to prevent list of duplicates from scoring a perfect match against a list with single elements. E.g. 'GGGGG' should not be identical to 'GG'.
This class is immutable and thread-safe.
DiceSimilarity
Constructor and Description |
---|
SimonWhite() |
Modifier and Type | Method and Description |
---|---|
float |
compare(List<T> a,
List<T> b)
Measures the similarity between lists a and b.
|
String |
toString() |
public float compare(List<T> a, List<T> b)
ListMetric
Copyright © 2014–2018. All rights reserved.