https://en.wikipedia.org/wiki/Triangle_inequality
https://en.wikipedia.org/wiki/Levenshtein_distance
https://github.com/trekhleb/javascript-algorithms/tree/master/src/algorithms/string/levenshtein-distance
https://github.com/trekhleb/javascript-algorithms/tree/master/src/algorithms/string/levenshtein-distance
https://en.wikipedia.org/wiki/Triangle_inequality
http://www.catalysoft.com/articles/StrikeAMatch.html
https://www.geeksforgeeks.org/bk-tree-introduction-implementation/
http://www.martinbroadhurst.com/tag/levenshtein-distance
http://www.catalysoft.com/articles/StrikeAMatch.html
https://www.geeksforgeeks.org/bk-tree-introduction-implementation/
http://www.martinbroadhurst.com/tag/levenshtein-distance
http://www.catalysoft.com/articles/StrikeAMatch.html
VP trees: A data structure for finding stuff fast
http://ceur-ws.org/Vol-256/submission_12.pdfhttps://nullwords.wordpress.com/2013/03/13/the-bk-tree-a-data-structure-for-spell-checking/
http://stevehanov.ca/blog/?id=130
https://fribbels.github.io/vptree/writeup
https://towardsdatascience.com/symspell-vs-bk-tree-100x-faster-fuzzy-string-search-spell-checking-c4f10d80a078
2. Metric Spaces and Similarity Queries
queries.
In this section, we briefly give the definitions for metric distance functions and different types of similarity
A metric distance function d(x,y) for a metric space is defined as follows:
i) d(x,y) = d(y,x)ii) 0 < d(x,y) < ∞, x≠ yiii) d(x,x) = 0iv) d(x,y) ≤ d(x,z) + d(z,y) (triangle inequality)
A metric distance function d(x,y) for a metric space is defined as follows:
i) d(x,y) = d(y,x)ii) 0 < d(x,y) < ∞, x≠ yiii) d(x,x) = 0iv) d(x,y) ≤ d(x,z) + d(z,y) (triangle inequality)
The above conditions are the only ones we can assume when designing an index structure based on distances between objects in a metric space. Note that, for a metric space, no geometric information can be utilized unlike the case for a Euclidean space. Thus, we only have a set of objects from a metric space, and a distance function d() that can be used to compute the distance between any two objects.
Similarity based queries can be posed in a number of ways. The most common one asks for all data objects that are within some specified distance from a given query object. These queries require retrieval of near neighborsof the query object. The formal definition for this type of queries is as follows:
Near Neighbor Query: From a given set of data objects X = {X1, X2, ..., Xn} from a metric space with a metric distance function d(), retrieve all data objects that are within distance r of a given query point Y. The resulting set will be { Xi | Xi ∈ X and d(Xi,Y)≤ r }. Here, r is generally referred to as the similarity measure, or the tolerance factor.
Some variations of the near neighbor query are also possible. The nearest neighbor query asks for the closest object to a given query object. Similarly, k closest objects may be requested as well.
Similarity based queries can be posed in a number of ways. The most common one asks for all data objects that are within some specified distance from a given query object. These queries require retrieval of near neighborsof the query object. The formal definition for this type of queries is as follows:
Near Neighbor Query: From a given set of data objects X = {X1, X2, ..., Xn} from a metric space with a metric distance function d(), retrieve all data objects that are within distance r of a given query point Y. The resulting set will be { Xi | Xi ∈ X and d(Xi,Y)≤ r }. Here, r is generally referred to as the similarity measure, or the tolerance factor.
Some variations of the near neighbor query are also possible. The nearest neighbor query asks for the closest object to a given query object. Similarly, k closest objects may be requested as well.
http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees
https://karczmarczuk.users.greyc.fr/TEACH/TAL/Doc/BK_search/shasha_best.pdf
Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching" pdf.