原創文章,轉載請注明: 轉載自LANCEYAN.COM
本文鏈接地址: 一致性hash和solr千萬級數據分布式搜索引擎中的應用
互聯網創業中大部分人都是草根創業,這個時候沒有強勁的服務器,也沒有錢去買很昂貴的海量數據庫。在這樣嚴峻的條件下,一批又一批的創業者從創業中獲得成 功,這個和當前的開源技術、海量數據架構有着必不可分的關系。比如我們使用mysql、nginx等開源軟件,通過架構和低成本服務器也可以搭建千萬級用 戶訪問量的系統。新浪微博、淘寶網、騰訊等大型互聯網公司都使用了很多開源免費系統搭建了他們的平台。所以,用什么沒關系,只要能夠在合理的情況下采用合 理的解決方案。
那怎么搭建一個好的系統架構呢?這個話題太大,這里主要說一下數據分流的方式。比如我們的數據庫服務器只能存儲200個數據,突然要搞一個活動預估達到600個數據。
可以采用兩種方式:橫向擴展或者縱向擴展。
縱向擴展是升級服務器的硬件資源。但是隨着機器的性能配置越高,價格越高,這個代價對於一般的小公司是承擔不起的。
橫向擴展是采用多個廉價的機器提供服務。這樣一個機器只能處理200個數據、3個機器就可以處理600個數據了,如果以后業務量增加還可以快速配置增加。在大多數情況都選擇橫向擴展的方式。如下圖:

現在有個問題了,這600個數據如何路由到對應的機器。需要考慮如果均衡分配,假設我們600個數據都是統一的自增id數據,從1~600,分成3 堆可以采用 id mod 3的方式。其實在真實環境可能不是這種id是字符串。需要把字符串轉變為hashcode再進行取模。
目前看起來是不是解決我們的問題了,所有數據都很好的分發並且沒有達到系統的負載。但如果我們的數據需要存儲、需要讀取就沒有這么容易了。業務增多 怎么辦,大家按照上面的橫向擴展知道需要增加一台服務器。但是就是因為增加這一台服務器帶來了一些問題。看下面這個例子,一共9個數,需要放到2台機器 (1、2)上。各個機器存放為:1號機器存放1、3、5、7、9 ,2號機器存放 2、4、6、8。如果擴展一台機器3如何,數據就要發生大遷移,1號機器存放1、4、7, 2號機器存放2、5、8, 3號機器存放3、6、9。如圖:

從圖中可以看出 1號機器的3、5、9遷移出去了、2好機器的4、6遷移出去了,按照新的秩序再重新分配了一遍。數據量小的話重新分配一遍代價並不大,但如果我們擁有上 億、上T級的數據這個操作成本是相當的高,少則幾個小時多則數天。並且遷移的時候原數據庫機器負載比較高,那大家就有疑問了,是不是這種水平擴展的架構方 式不太合理?
—————————–華麗分割線—————————————
一致性hash就是在這種應用背景提出來的,現在被廣泛應用於分布式緩存,比如memcached。下面簡單介紹下一致性hash的基本原理。最早 的版本 http://dl.acm.org/citation.cfm?id=258660。國內網上有很多文章都寫的比較好。如: http://blog.csdn.net/x15594/article/details/6270242
下面簡單舉個例子來說明一致性hash。
准備:1、2、3 三台機器
還有待分配的9個數 1、2、3、4、5、6、7、8、9
一致性hash算法架構
步驟
一、構造出來 2的32次方 個虛擬節點出來,因為計算機里面是01的世界,進行划分時采用2的次方數據容易分配均衡。另 2的32次方是42億,我們就算有超大量的服務器也不可能超過42億台吧,擴展和均衡性都保證了。

二、將三台機器分別取IP進行hashcode計算(這里也可以取hostname,只要能夠唯一區別各個機器就可以了),然后映射到2的32次方上去。 比如1號機器算出來的hashcode並且mod (2^32)為 123(這個是虛構的),2號機器算出來的值為 2300420,3號機器算出來為 90203920。這樣三台機器就映射到了這個虛擬的42億環形結構的節點上了。

三、將數據(1-9)也用同樣的方法算出hashcode並對42億取模將其配置到環形節點上。假設這幾個節點算出來的值為 1:10,2:23564,3:57,4:6984,5:5689632,6:86546845,7:122,8:3300689,9:135468。可 以看出 1、3、7小於123, 2、4、9 小於 2300420 大於 123, 5、6、8 大於 2300420 小於90203920。從數據映射到的位置開始順時針查找,將數據保存到找到的第一個Cache節點上。如果超過2^32仍然找不到Cache節點,就會 保存到第一個Cache節點上。也就是1、3、7將分配到1號機器,2、4、9將分配到2號機器,5、6、8將分配到3號機器。

這個時候大家可能會問,我到現在沒有看見一致性hash帶來任何好處,比傳統的取模還增加了復雜度。現在馬上來做一些關鍵性的處理,比如我們增加一台機 器。按照原來我們需要把所有的數據重新分配到四台機器。一致性hash怎么做呢?現在4號機器加進來,他的hash值算出來取模后是12302012。 5、8 大於2300420 小於12302012 ,6 大於 12302012 小於90203920 。這樣調整的只是把5、8從3號機器刪除,4號機器中加入 5、6。

同理,刪除機器怎么做呢,假設2號機器掛掉,受影響的也只是2號機器上的數據被遷移到離它節點,上圖為4號機器。

大家應該明白一致性hash的基本原理了吧。不過這種算法還是有缺陷,比如在機器節點比較少、數據量大的時候,數據的分布可能不是很均衡,就會導致其中一 台服務器的數據比其他機器多很多。為了解決這個問題,需要引入虛擬服務器節點的機制。如我們一共有只有三台機器,1、2、3。但是實際又不可能有這么多機 器怎么解決呢?把 這些機器各自虛擬化出來3台機器,也就是 1a 1b 1c 2a 2b 2c 3a 3b 3c,這樣就變成了9台機器。實際 1a 1b 1c 還是對應1。但是實際分布到環形節點就變成了9台機器。數據分布也就能夠更分散一點。如圖:

寫了這么多一致性hash,這個和分布式搜索有什么半點關系?我們現在使用solr4搭建了分布式搜索,測試了基於solrcloud的分布式平台 提交20條數據居然需要幾十秒,所以就廢棄了solrcloud。采用自己hack solr平台,不用zookeeper做分布式一致性管理平台,自己管理數據的分發機制。既然需要自己管理數據的分發,就需要考慮到索引的創建,索引的更 新。這樣我們的一致性hash也就用上了。整體架構如下圖:

建立和更新需要維持機器的位置,能夠根據數據的key找到對應的數據分發並更新。這里需要考慮的是如何高效、可靠的把數據建立、更新到索引里。
備份服務器防止建立服務器掛掉,可以根據備份服務器快速恢復。
讀服務器主要做讀寫分離使用,防止寫索引影響查詢數據。
集群管理服務器管理整個集群內的服務器狀態、告警。
整個集群隨着業務增多還可以按照數據的類型划分,比如用戶、微博等。每個類型按照上圖架構搭建,就可以滿足一般性能的分布式搜索。對於solr和分布式搜索的話題后續再聊。
擴展閱讀:
java的hashmap隨着數據量的增加也會出現map調整的問題,必要的時候就初始化足夠大的size以防止容量不足對已有數據進行重新hash計算。
疫苗:Java HashMap的死循環 http://coolshell.cn/articles/9606.html
一致性哈希算法的優化—-關於如何保正在環中增加新節點時,命中率不受影響 (原拍拍同事scott)http://scottina.iteye.com/blog/650380
語言實現:
http://weblogs.java.net/blog/2007/11/27/consistent-hashing java 版本的例子
http://blog.csdn.net/mayongzhan/archive/2009/06/25/4298834.aspx PHP 版的例子
http://www.codeproject.com/KB/recipes/lib-conhash.aspx C語言版本例子
I've bumped into consistent hashing a couple of times lately. The paper that introduced the idea (Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web by David Karger et al) appeared ten years ago, although recently it seems the idea has quietly been finding its way into more and more services, from Amazon's Dynamo to memcached (courtesy of Last.fm). So what is consistent hashing and why should you care?
The need for consistent hashing arose from limitations experienced while running collections of caching machines - web caches, for example. If you have a collection of n cache machines then a common way of load balancing across them is to put object o in cache machine number hash(o) mod n. This works well until you add or remove cache machines (for whatever reason), for then n changes and every object is hashed to a new location. This can be catastrophic since the originating content servers are swamped with requests from the cache machines. It's as if the cache suddenly disappeared. Which it has, in a sense. (This is why you should care - consistent hashing is needed to avoid swamping your servers!)
It would be nice if, when a cache machine was added, it took its fair share of objects from all the other cache machines. Equally, when a cache machine was removed, it would be nice if its objects were shared between the remaining machines. This is exactly what consistent hashing does - consistently maps objects to the same cache machine, as far as is possible, at least.
The basic idea behind the consistent hashing algorithm is to hash both objects and caches using the same hash function. The reason to do this is to map the cache to an interval, which will contain a number of object hashes. If the cache is removed then its interval is taken over by a cache with an adjacent interval. All the other caches remain unchanged.
Demonstration
Let's look at this in more detail. The hash function actually maps objects and caches to a number range. This should be familiar to every Java programmer - the hashCode method on Object returns an int, which lies in the range -231 to 231-1. Imagine mapping this range into a circle so the values wrap around. Here's a picture of the circle with a number of objects (1, 2, 3, 4) and caches (A, B, C) marked at the points that they hash to (based on a diagram from Web Caching with Consistent Hashing by David Karger et al):

To find which cache an object goes in, we move clockwise round the circle until we find a cache point. So in the diagram above, we see object 1 and 4 belong in cache A, object 2 belongs in cache B and object 3 belongs in cache C. Consider what happens if cache C is removed: object 3 now belongs in cache A, and all the other object mappings are unchanged. If then another cache D is added in the position marked it will take objects 3 and 4, leaving only object 1 belonging to A.

This works well, except the size of the intervals assigned to each cache is pretty hit and miss. Since it is essentially random it is possible to have a very non-uniform distribution of objects between caches. The solution to this problem is to introduce the idea of "virtual nodes", which are replicas of cache points in the circle. So whenever we add a cache we create a number of points in the circle for it.
You can see the effect of this in the following plot which I produced by simulating storing 10,000 objects in 10 caches using the code described below. On the x-axis is the number of replicas of cache points (with a logarithmic scale). When it is small, we see that the distribution of objects across caches is unbalanced, since the standard deviation as a percentage of the mean number of objects per cache (on the y-axis, also logarithmic) is high. As the number of replicas increases the distribution of objects becomes more balanced. This experiment shows that a figure of one or two hundred replicas achieves an acceptable balance (a standard deviation that is roughly between 5% and 10% of the mean).

Implementation
For completeness here is a simple implementation in Java. In order for consistent hashing to be effective it is important to have a hash function that mixes well. Most implementations of Object's hashCode do not mix well - for example, they typically produce a restricted number of small integer values - so we have a HashFunction interface to allow a custom hash function to be used. MD5 hashes are recommended here.
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;
publicclassConsistentHash<T>{
privatefinalHashFunction hashFunction;
privatefinalint numberOfReplicas;
privatefinalSortedMap<Integer, T> circle =newTreeMap<Integer, T>();
publicConsistentHash(HashFunction hashFunction,int numberOfReplicas,
Collection<T> nodes){
this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;
for(T node : nodes){
add(node);
}
}
publicvoid add(T node){
for(int i =0; i < numberOfReplicas; i++){
circle.put(hashFunction.hash(node.toString()+ i), node);
}
}
publicvoid remove(T node){
for(int i =0; i < numberOfReplicas; i++){
circle.remove(hashFunction.hash(node.toString()+ i));
}
}
public T get(Object key){
if(circle.isEmpty()){
returnnull;
}
int hash = hashFunction.hash(key);
if(!circle.containsKey(hash)){
SortedMap<Integer, T> tailMap = circle.tailMap(hash);
hash = tailMap.isEmpty()? circle.firstKey(): tailMap.firstKey();
}
return circle.get(hash);
}
}
The circle is represented as a sorted map of integers, which represent the hash values, to caches (of type T here).
When a ConsistentHash object is created each node is added to the circle map a number of times (controlled by numberOfReplicas). The location of each replica is chosen by hashing the node's name along with a numerical suffix, and the node is stored at each of these points in the map.
To find a node for an object (the get method), the hash value of the object is used to look in the map. Most of the time there will not be a node stored at this hash value (since the hash value space is typically much larger than the number of nodes, even with replicas), so the next node is found by looking for the first key in the tail map. If the tail map is empty then we wrap around the circle by getting the first key in the circle.
Usage
So how can you use consistent hashing? You are most likely to meet it in a library, rather than having to code it yourself. For example, as mentioned above, memcached, a distributed memory object caching system, now has clients that support consistent hashing. Last.fm's ketama by Richard Jones was the first, and there is now a Java implementation by Dustin Sallings (which inspired my simplified demonstration implementation above). It is interesting to note that it is only the client that needs to implement the consistent hashing algorithm - the memcached server is unchanged. Other systems that employ consistent hashing include Chord, which is a distributed hash table implementation, and Amazon's Dynamo, which is a key-value store (not available outside Amazon).

