Friday, April 19, 2019

VP Trees

https://en.wikipedia.org/wiki/Triangle_inequality


https://en.wikipedia.org/wiki/Triangle_inequality



VP trees: A data structure for finding stuff fast

http://ceur-ws.org/Vol-256/submission_12.pdf

https://nullwords.wordpress.com/2013/03/13/the-bk-tree-a-data-structure-for-spell-checking/

http://stevehanov.ca/blog/?id=130

https://fribbels.github.io/vptree/writeup

https://towardsdatascience.com/symspell-vs-bk-tree-100x-faster-fuzzy-string-search-spell-checking-c4f10d80a078

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.7401&rep=rep1&type=pdf



2. Metric Spaces and Similarity Queries
queries.
In this section, we briefly give the definitions for metric distance functions and different types of similarity
A metric distance function d(x,y) for a metric space is defined as follows:
i) d(x,y) = d(y,x)ii) 0 < d(x,y) < , x≠ yiii) d(x,x) = 0iv) d(x,y) ≤ d(x,z) + d(z,y) (triangle inequality)
The above conditions are the only ones we can assume when designing an index structure based on distances between objects in a metric space. Note that, for a metric space, no geometric information can be utilized unlike the case for a Euclidean space. Thus, we only have a set of objects from a metric space, and a distance function d() that can be used to compute the distance between any two objects.
Similarity based queries can be posed in a number of ways. The most common one asks for all data objects that are within some specified distance from a given query object. These queries require retrieval of near neighborsof the query object. The formal definition for this type of queries is as follows:
Near Neighbor Query: From a given set of data objects X = {X1, X2, ..., Xn} from a metric space with a metric distance function d(), retrieve all data objects that are within distance of a given query point Y. The resulting set will be { X| X∈ X and d(Xi,Y)≤ }. Here, is generally referred to as the similarity measure, or the tolerance factor.
Some variations of the near neighbor query are also possible. The nearest neighbor query asks for the closest object to a given query object. Similarly, k closest objects may be requested as well.

http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees

https://karczmarczuk.users.greyc.fr/TEACH/TAL/Doc/BK_search/shasha_best.pdf

Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching" pdf.



Thursday, April 18, 2019

ScalaTest