HDKV: High-Dimensional Similarity Query in Key-Value Stores

本文轉載自查看原文 2012-02-17 00:07 7814 閱讀文章札記

文章集中於key-value store

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

Stable distributions

The hash function ^[8] $h_{\mathbf{a},b} (\boldsymbol{\upsilon}) : \mathcal{R}^d \to \mathcal{N}$ maps a d dimensional vector $\boldsymbol{\upsilon}$ onto a set of integers. Each hash function in the family is indexed by a choice of random $\mathbf{a}$ and $b$ where $\mathbf{a}$ is a d dimensional vector with entries chosen independently from a stable distribution and $b$ is a real number chosen uniformly from the range [0,r]. For a fixed $\mathbf{a},b$ the hash function $h_{\mathbf{a},b}$ is given by $h_{\mathbf{a},b} (\boldsymbol{\upsilon}) = \left \lfloor \frac{\mathbf{a}\cdot \boldsymbol{\upsilon}+b}{r} \right \rfloor$ .

Other construction methods for hash functions have been proposed to better fit the data. ^[9] In particular k-means hash functions are better in practice than projection-based hash functions, but without any theoretical guarantee.

The key idea of locality-sensitive hash (LSH) is to hash the points using several hash functions so as to ensure that, for each function, the probability of
collision is much higher for objects which are close to each other than for those which are far apart. Then, one can determine near neighbors by hashing the
query point and retrieving elements stored in buckets containing that point.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 解析JSONObject為key-value Golang 之 key-value LevelDB etcd查看key-value 使用Key-Value Coding Key-Value Observing機制 JS中key-value存取小程序-picker之key-value形式 url的參數解析成key-value Key-Value 數據庫簡介 go的map獲取對應的key-value