哈希表的設計與實現

本文轉載自查看原文 2017-05-27 17:35 1715 C/C++/ 面試題/ 數據結構

寫在前面的話，本來看網上的面經就一直有關於哈希表的問題，再加之實驗室同學頭條面試的時候讓實現一個unordered_map，本來已經把對哈希表的總結和實現提上日程了。奈何太懶，一天拖一天，直到自己面阿里的時候被面試官在哈希表上翻來覆去蹂躪的時候，真的是不得不感嘆一句，活該！！！

業精於勤..

簡介

哈希表，也稱散列表，是實現字典操作的一種有效的數據結構。盡管最壞情況下，散列表查找一個元素的時間與鏈表中查找的時間相同，達到了O(n)。然而在實際應用中，散列表查找的性能是極好的。在一些合理的假設下，在散列表中可以查找一個元素的平均時間是O(1)

哈希表的精髓在於哈希二字上面，也就是數學里面常用到的映射關系。它是通過哈希函數將關鍵字映射到表中的某個位置上進行存放，以實現快速插入和查詢的。

為什么需要哈希函數？簡單來講，解決存儲空間的考慮。試想，將100個關鍵字存入大小為100的數組里，此時肯定是不需要哈希函數的，一對一的放，肯定是可以實現的。但是當數據量增大，將1000個關鍵字，存入大小為100的數組里呢？此時一個一個的放，那剩下的怎么辦呢，所以，我們需要某種計算方法，既能把這1000個關鍵字存進去，而且最主要是還能取出來。這就是哈希函數要做的事，給每一個關鍵字找一個合適的位置，讓你既能存進去，還能把它取出來。注意，哈希表里一般存放的字典類型數據，即(key, value)的數據，是根據key去存取value。

解決沖突的方法

通過哈希函數去計算哈希值，難免會有沖突的時候，解決沖突的方法有如下幾種：

開放定址法: 依靠數組中的空位解決碰撞沖突
- 線性探測法：直接檢測散列表的下一個位置(即索引值加1)，如果仍沖突，繼續。
- 二次探測法：即H + 1 ², H + 2², H + 3².。。
- 偽隨機探測
再哈希法：使用多個哈希函數，第一個沖突時，使用第二個哈希函數，直到不沖突為止
鏈地址法：將所有哈希地址相同的關鍵字，都鏈接在同一個鏈表中。

散列函數

在使用開鏈法解決沖突問題時，將哈希表內的元素稱為桶(bucket)，大約意義是，表格內的每個單元，涵蓋的不只是個節點(元素)，甚至可能是一“桶”節點。

假設哈希表中共有M個元素(桶)，編號為0,1,..,M-1。現在哈希函數要做的就是將關鍵字映射到這M個桶中，盡量保證均勻。

最常用的是除留余數法計算哈希值：用一個特定的質數來除所給定的關鍵字，所得余數即為該關鍵字的哈希值。

哈希表設計

在此，仿STL的hashtable實現一個簡化版的哈希表，作為本文的結束。

采用開鏈法處理沖突，然后hashtable以vector作為底層數組，鍵值類型的話，直接用template吧

哈希表節點

哈希表節點定義如下：

template<class Value>
struct hashtable_node{
    hashtable_node *next;
    Value val;
};

桶里的鏈表也自己實現，不使用STL里面提供的list，算是熟悉熟悉單鏈表吧。

哈希表

首先理清哈希表需要的模板類型，Key, Value
只做最簡單的(Key, Value), Key的類型考慮char, int, double, string

下面給出哈希表的定義，本文只考慮幾個比較常用的操作，即插入，刪除，查找，返回大小，最后再加上一個打印哈希表的函數，具體定義如下：

template<class Key, class Value>
class hashtable{
public:
	//哈希表節點鍵值類型
	typedef pair<Key, Value> T;

	//表節點
	typedef hashtable_node<T> node;
public:
	//構造函數
	hashtable();
	hashtable(hashtable<Key, Value> &ht)
		: buckets(ht.buckets), num_elements(ht.num_elements)
	{}

	//插入一個關鍵字
	void insert(T kv);   

	//根據鍵值刪除關鍵字 
	void erase(Key key);

	//判斷關鍵字是否在哈希表中
	bool find(Key key);  

	//返回哈希表中關鍵字個數
	int size(){
		return num_elements;
	}

	void printHashTable();
private:
	//根據傳入大小判斷是否需要重新分配哈希表
	void resize(int num_elements);

	//根據鍵值返回桶的編號
	int buckets_index(Key key, int size){
		return hash(key) % size;
	}

	//根據節點返回鍵值
	Key get_key(T  node){
		return node.first;
	}
private:
	//使用STL list<T>作桶
	vector<node*> buckets;    

	//哈希表中元素個數
	size_t num_elements;

	//哈希函數
	hashFunc<Key> hash;
};

哈希函數

哈希函數的設計，由於只考慮了char, int, double, string四種類型，在使用模板類的話，可以通過template的偏特化特性直接為這四種類型設計特化版本。相關代碼如下

/*
 * 哈希函數的設定，只考慮 4 種鍵值類型的哈希函數
 * char, int , double , string
 */
template<class Key> struct hashFunc{};

template<> struct hashFunc < char > {
	size_t operator()(char x) const { return x; }
};

template<> struct hashFunc < int > {
	size_t operator()(int x) const { return x; }
};

template<> struct hashFunc < double > {
	size_t operator()(const double & dValue) const
	{
		int e = 0;
		double tmp = dValue;
		if (dValue<0)
		{
			tmp = -dValue;
		}
		e = ceil(log(dValue));
		return size_t((INT64_MAX + 1.0) * tmp * exp(-e));
	}
};

template<> struct hashFunc < string > {
	size_t operator()(const string & str) const
	{
		size_t h = 0; for (size_t i = 0; i<str.length(); ++i)
		{
			h = (h << 5) - h + str[i];
		}
		return h; 
	}
};

哈希表具體實現

下面貼出哈希表的具體實現代碼吧，關於各個函數的實現，都給出了相關注釋，應該算是簡單易懂的。

//將表格的大小設為質數，然后直接使用除留余數法求哈希值
//按照SGI STL中的原則，首先保存28個質數(逐漸呈現大約兩倍的關系)，
//同時提供一個函數，用來查詢在這28個質數中，最接近某數並大於某數的質數
static const int num_primes = 28;
static const unsigned long prime_list[num_primes] =
{
	53, 97, 193, 389, 769,
	1543, 3079, 6151, 12289, 24593,
	49157, 98317, 196613, 393241, 786433,
	1572869, 3145739, 6291469, 12582917, 25165843,
	50331653, 100663319, 201326611, 402653189, 805306457,
	1610612741, 3221225473, 4294967291
};

//找出最接近但大於的質數
inline unsigned long next_prime(unsigned long n){
	const unsigned long *first = prime_list;
	const unsigned long *last = prime_list + num_primes;
	const unsigned long *pos = lower_bound(first, last, n);

	return pos == last ? *(last - 1) : *pos;
}

//構造函數，初始化哈希表
template<class Key, class Value>
hashtable<Key, Value>::hashtable(){
	const int n_buckets = next_prime(1);
	buckets.reserve(n_buckets);
	buckets.insert(buckets.end(), n_buckets, (node*)0);
	num_elements = 0;
}

//插入一個關鍵字
template<class Key, class Value>
void hashtable<Key, Value>::insert(T kv){
	//在插入之前，調用resize函數，判斷是否需要重建哈希表
	resize(num_elements + 1);
	//計算出插入位置
	int pos = buckets_index(kv.first, buckets.size());
	node *head = buckets[pos];	

	//判斷鍵值是否已經存在，若存在，則直接返回，不插入
	for (node *cur = head; cur; cur = cur->next){
		if (cur->val.first == kv.first)
			return;
	}

	//分配節點，插入
	node *tmp = new node(kv);
	tmp->next = head;
	buckets[pos] = tmp;
	num_elements++; //記錄個數
}

//根據鍵值刪除關鍵字 
template < class Key, class Value> 
void hashtable<Key, Value>::erase(Key key){
	//找出桶的位置
	int pos = buckets_index(key, buckets.size());
	node *head = buckets[pos];
	node *pre = NULL; 
	while (head){
		//查找到對應鍵，並刪除
		if (head->val.first == key){
			if (pre == NULL){
				buckets[pos] = head->next;
				delete head;
				num_elements--;
				return;
			}
			else{
				pre->next = head->next;
				delete head;
				num_elements--;
				return;
			}
		}

		pre = head;
		head = head->next;
	}
}

//根據鍵值，判斷是否在哈希表中
template<class Key, class Value>
bool hashtable<Key, Value>::find(Key key){
	int pos = buckets_index(key, buckets.size());
	node *head = buckets[pos];

	while (head){
		if (head->val.first == key)
			return true;
		head = head->next;
	}
	return false;
}


template<class Key, class Value>
void hashtable<Key, Value>::resize(int num_elements){
	//當元素個數大於桶的個數時，重新分配哈希表
	const int size = buckets.size();
	if (num_elements <= size) return;

	//找出下一個質數
	const int next_size = next_prime(num_elements);

	//初始化新的哈希表
	vector<node*> tmp(next_size, (node*)0);

	for (int i = 0; i < size; ++i){
		node *head = buckets[i];
		while (head){
			int new_pos = buckets_index(head->val.first, next_size);
			buckets[i] = head->next;
			head->next = tmp[new_pos];
			tmp[new_pos] = head;
			head = buckets[i];
		}
	}

	//交換新舊哈希表
	buckets.swap(tmp);
}

template<class Key, class Value>
void hashtable<Key, Value>::printHashTable(){
	cout << "哈希表內容如下 :" << endl;
	for (int i = 0; i < buckets.size(); ++i){
		node *head = buckets[i];
		while (head){
			cout << head->val.first << "  " << head->val.second << endl;
			head = head->next;
		}
	}
}

測試如下

以下為測試代碼：

#include"hashtable.h"

int main()
{
	//(int, string) 測試如下：
	hashtable<int, string> ht;
	
	ht.insert(pair<int, string>(12, "this"));
	ht.insert(pair<int, string>(3, "is"));
	ht.insert(pair<int, string>(58, "a"));
	ht.insert(pair<int, string>(10, "test"));
	ht.insert(pair<int, string>(23, "hashtable"));

	ht.printHashTable();
	cout << "刪除(3, a)后，";
	ht.erase(3);
	ht.printHashTable();

	cout << "插入(3, hahaha)后，";
	ht.insert(pair<int, string>(3, "hahaha"));
	ht.printHashTable();

	cout << "===================================" << endl;
	
	//(string, string) 測試如下
	hashtable<string, string> strHt;
	strHt.insert(pair<string, string>("hello", "world"));
	strHt.insert(pair<string, string>("other", "hash"));
	strHt.insert(pair<string, string>("test", "china"));
	strHt.insert(pair<string, string>("stl", "nimeiya"));

	strHt.printHashTable();

	cout << "判斷 test 是否在哈希表中：" << strHt.find("test") << endl;
	cout << "返回此時哈希表中的元素個數：" << strHt.size() << endl;
	cout << "刪除test后" << endl;
	strHt.erase("test");
	strHt.printHashTable();
	cout << "判斷 test 是否在哈希表中：" << strHt.find("test") << endl;
	cout << "返回此時哈希表中的元素個數：" << strHt.size() << endl;
}

測試結果如圖：

總結

本文對哈希表相關概念簡要做了一個介紹，並實現了一個簡單的哈希表
相關代碼可至 hashTable 下載
下一篇文章希望可以總結一些哈希表相關面試題。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PHP的哈希表實現哈希表的C實現（二） js模擬實現哈希表 Redis哈希表的實現要點 Java集合（八）哈希表及哈希函數的實現方式 Linux內核哈希表的結構與實現 java實現自定義哈希表 Java中哈希表(Hashtable)是如何實現的 C語言實現簡單的哈希表哈希表的C++實現(轉)