【Java心得總結七】Java容器下——Map 在自己總結的這篇文章中有提到hashCode,但是沒有細究,今天細究整理一下hashCode相關問題
1.hashCode與equals
首先我們都知道hashCode()和equals()函數是java基類Object的一部分,我查閱了java7文檔,其中對於兩者的描述如下:
解讀這里對hashCode的描述,不難發現:
- 首先hashCode必須是一個整數,即Integer類型的
- 其次滿足一致性,即在程序的同一次執行無論調用該函數多少次都返回相同的整數。(這里注意是程序的一次執行,而程序不同的執行間是不保證返回相同結果,因為hashcode計算方式可能會涉及到物理地址,而程序的不同執行對象在內存的位置會不同)
- 另外與equas配合,如果兩個對象調用equals相同那么一定擁有相同的hashcode,然而反之,如果兩個對象調用equals不相等,hashcode不一定就不同(但是這里提到盡量產生不同的hashcode有利於提高哈希表的性能,減少了沖突嘛)
這里突然發現《java編程思想》中對於equals的描述原來出自這里:
- 自反性:對任意x,x.equals(x)一定返回true
- 對稱性:對任意x,y如果x.equals(y)返回true,則y.equals(y)返回true
- 傳遞性:對任意x,y,z如果x.equals(y)和y.equals(z)都返回true,則x.equals(z)返回true
- 一致性:對任意x,y,equals函數返回的結果無論調用多少次都一致
- 另外還有就是任意x,x.equals(null)都會返回false
- 還需要注意的就是一旦equals函數被override,那么hashcode也一定要override以保持前面的原則
2.Map對hashCode的應用
Java中HashMap的實現,我截取了部分代碼如下:
代碼段-1
1 /* HashMap實現部分代碼 */ 2 public class HashMap<K,V> 3 extends AbstractMap<K,V> 4 implements Map<K,V>, Cloneable, Serializable 5 { 6 /** 7 * The default initial capacity - MUST be a power of two. 8 */ 9 static final int DEFAULT_INITIAL_CAPACITY = 16; 10 11 /** 12 * The maximum capacity, used if a higher value is implicitly specified 13 * by either of the constructors with arguments. 14 * MUST be a power of two <= 1<<30. 15 */ 16 static final int MAXIMUM_CAPACITY = 1 << 30; 17 18 /** 19 * The load factor used when none specified in constructor. 20 */ 21 static final float DEFAULT_LOAD_FACTOR = 0.75f; 22 23 /** 24 * The table, resized as necessary. Length MUST Always be a power of two. 25 */ 26 transient Entry<K,V>[] table; 27 28 /** 29 * The number of key-value mappings contained in this map. 30 */ 31 transient int size; 32 33 /** 34 * The next size value at which to resize (capacity * load factor). 35 * @serial 36 */ 37 int threshold; 38 39 /** 40 * The load factor for the hash table. 41 * 42 * @serial 43 */ 44 final float loadFactor; 45 46 /** 47 * Retrieve object hash code and applies a supplemental hash function to the 48 * result hash, which defends against poor quality hash functions. This is 49 * critical because HashMap uses power-of-two length hash tables, that 50 * otherwise encounter collisions for hashCodes that do not differ 51 * in lower bits. Note: Null keys always map to hash 0, thus index 0. 52 */ 53 final int hash(Object k) { 54 int h = 0; 55 if (useAltHashing) { 56 if (k instanceof String) { 57 return sun.misc.Hashing.stringHash32((String) k); 58 } 59 h = hashSeed; 60 } 61 62 h ^= k.hashCode(); 63 64 // This function ensures that hashCodes that differ only by 65 // constant multiples at each bit position have a bounded 66 // number of collisions (approximately 8 at default load factor). 67 h ^= (h >>> 20) ^ (h >>> 12); 68 return h ^ (h >>> 7) ^ (h >>> 4); 69 } 70 71 /** 72 * Returns index for hash code h. 73 */ 74 static int indexFor(int h, int length) { 75 return h & (length-1); 76 } 77 78 /** 79 * Adds a new entry with the specified key, value and hash code to 80 * the specified bucket. It is the responsibility of this 81 * method to resize the table if appropriate. 82 * 83 * Subclass overrides this to alter the behavior of put method. 84 */ 85 void addEntry(int hash, K key, V value, int bucketIndex) { 86 if ((size >= threshold) && (null != table[bucketIndex])) { 87 resize(2 * table.length); 88 hash = (null != key) ? hash(key) : 0; 89 bucketIndex = indexFor(hash, table.length); 90 } 91 92 createEntry(hash, key, value, bucketIndex); 93 } 94 95 /** 96 * Like addEntry except that this version is used when creating entries 97 * as part of Map construction or "pseudo-construction" (cloning, 98 * deserialization). This version needn't worry about resizing the table. 99 * 100 * Subclass overrides this to alter the behavior of HashMap(Map), 101 * clone, and readObject. 102 */ 103 void createEntry(int hash, K key, V value, int bucketIndex) { 104 Entry<K,V> e = table[bucketIndex]; 105 table[bucketIndex] = new Entry<>(hash, key, value, e); 106 size++; 107 } 108 109 /** 110 * Associates the specified value with the specified key in this map. 111 * If the map previously contained a mapping for the key, the old 112 * value is replaced. 113 * 114 * @param key key with which the specified value is to be associated 115 * @param value value to be associated with the specified key 116 * @return the previous value associated with <tt>key</tt>, or 117 * <tt>null</tt> if there was no mapping for <tt>key</tt>. 118 * (A <tt>null</tt> return can also indicate that the map 119 * previously associated <tt>null</tt> with <tt>key</tt>.) 120 */ 121 public V put(K key, V value) { 122 if (key == null) 123 return putForNullKey(value); 124 int hash = hash(key); 125 int i = indexFor(hash, table.length); 126 for (Entry<K,V> e = table[i]; e != null; e = e.next) { 127 Object k; 128 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { 129 V oldValue = e.value; 130 e.value = value; 131 e.recordAccess(this); 132 return oldValue; 133 } 134 } 135 136 modCount++; 137 addEntry(hash, key, value, i); 138 return null; 139 } 140 141 /** 142 * Returns the entry associated with the specified key in the 143 * HashMap. Returns null if the HashMap contains no mapping 144 * for the key. 145 */ 146 final Entry<K,V> getEntry(Object key) { 147 int hash = (key == null) ? 0 : hash(key); 148 for (Entry<K,V> e = table[indexFor(hash, table.length)]; 149 e != null; 150 e = e.next) { 151 Object k; 152 if (e.hash == hash && 153 ((k = e.key) == key || (key != null && key.equals(k)))) 154 return e; 155 } 156 return null; 157 } 158 159 /** 160 * Removes and returns the entry associated with the specified key 161 * in the HashMap. Returns null if the HashMap contains no mapping 162 * for this key. 163 */ 164 final Entry<K,V> removeEntryForKey(Object key) { 165 int hash = (key == null) ? 0 : hash(key); 166 int i = indexFor(hash, table.length); 167 Entry<K,V> prev = table[i]; 168 Entry<K,V> e = prev; 169 170 while (e != null) { 171 Entry<K,V> next = e.next; 172 Object k; 173 if (e.hash == hash && 174 ((k = e.key) == key || (key != null && key.equals(k)))) { 175 modCount++; 176 size--; 177 if (prev == e) 178 table[i] = next; 179 else 180 prev.next = next; 181 e.recordRemoval(this); 182 return e; 183 } 184 prev = e; 185 e = next; 186 } 187 188 return e; 189 } 190 191 /** 192 * Rehashes the contents of this map into a new array with a 193 * larger capacity. This method is called automatically when the 194 * number of keys in this map reaches its threshold. 195 * 196 * If current capacity is MAXIMUM_CAPACITY, this method does not 197 * resize the map, but sets threshold to Integer.MAX_VALUE. 198 * This has the effect of preventing future calls. 199 * 200 * @param newCapacity the new capacity, MUST be a power of two; 201 * must be greater than current capacity unless current 202 * capacity is MAXIMUM_CAPACITY (in which case value 203 * is irrelevant). 204 */ 205 void resize(int newCapacity) { 206 Entry[] oldTable = table; 207 int oldCapacity = oldTable.length; 208 if (oldCapacity == MAXIMUM_CAPACITY) { 209 threshold = Integer.MAX_VALUE; 210 return; 211 } 212 213 Entry[] newTable = new Entry[newCapacity]; 214 boolean oldAltHashing = useAltHashing; 215 useAltHashing |= sun.misc.VM.isBooted() && 216 (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); 217 boolean rehash = oldAltHashing ^ useAltHashing; 218 transfer(newTable, rehash); 219 table = newTable; 220 threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); 221 } 222 223 /** 224 * Transfers all entries from current table to newTable. 225 */ 226 void transfer(Entry[] newTable, boolean rehash) { 227 int newCapacity = newTable.length; 228 for (Entry<K,V> e : table) { 229 while(null != e) { 230 Entry<K,V> next = e.next; 231 if (rehash) { 232 e.hash = null == e.key ? 0 : hash(e.key); 233 } 234 int i = indexFor(e.hash, newCapacity); 235 e.next = newTable[i]; 236 newTable[i] = e; 237 e = next; 238 } 239 } 240 } 241 }
代碼段-2
1 static class Entry<K,V> implements Map.Entry<K,V> { 2 final K key; 3 V value; 4 Entry<K,V> next; 5 int hash; 6 7 /** 8 * Creates new entry. 9 */ 10 Entry(int h, K k, V v, Entry<K,V> n) { 11 value = v; 12 next = n; 13 key = k; 14 hash = h; 15 } 16 }
我將HahsMap中的增刪改查以及相關用到的函數截取了出來以作分析:
- 存儲方式:Java中的HashMap源碼是通過Entry<K,V>[]即一個Entry數組實現的,在代碼26行(前面加transient是多線程問題);
- 散列函數:53行的hash函數中我們可以看出Java源代碼利用HashTable中的key的hashCode來計算哈希值,我們可以將這個函數看做散列函數;
- 擴展存儲空間:在代碼85行addEntry函數中我們看到當發生空間不足或者沖突的時候,java會利用代碼205行的代碼進行擴充,擴充方法就是new一個新的Entry數組,數組大小是原有數組大小的兩倍,之后再將舊的表格中的數據全部拷貝到現有新的數組中。(注:Java在性能與空間之間做了權衡,即只有當size大於某一個閾值threshold且發生了沖突的時候才會進行存儲數組的擴充)
- 存儲位置:在代碼89行addEntry函數中,當添加一個元素時,如何確定將該Entry添加到數組的什么位置:利用了代碼74行的indexFor函數,通過利用hash函數計算的哈希值與數組長度進行與運算來獲得(保證了返回的值不會超出數組界限);
- 沖突解決:哈希表結構不得不提的就是沖突問題,因為我們知道幾乎不可能找到一個完美的散列函數把所有數據完全分散不沖突的散列在存儲序列中(除非存儲空間足夠大),所以沖突時必不可少的,查看代碼段-2,會發現每個Entry中會有一個指針指向下一個Entry,在代碼段-1中的105行,會發現createEntry函數中會將最新插入的Entry放在table中,然后讓它指向原有的鏈表。即Java HashMap中用了最傳統的當發生沖突在后面掛鏈表的方式來解決。
- put函數:在代碼121行我們看到我們最常用的HashMap插入元素方法put,當傳入要添加的key和value時,它會遍歷哈希表,來確定表中是否已經有key(確定兩個key是否相等就要用到equals函數,所以如果我們在利用HashMap的時候key是自定義類,那么切記要override equals函數),如果沒有則新添加,如果有則覆蓋原有key的value值
- getEntry函數:在代碼146行getEntry函數中會再次計算出傳入key的hash值,然后還是通過代碼74行的indexFor函數計算該元素在數組中的位置,我們發現函數中並不是O(1)的方式取到的,需要用到一個循環,因為我們上面提到了沖突,如果在某點發生了沖突,那么就要通過遍歷沖突鏈表來進行查找
- removeEntry函數:同樣涉及到一個查找的過程,而且還涉及到如果被刪除元素在沖突鏈表中需要修改前后元素的指針
3.散列函數/哈希函數
通過上面的分析我們也會發現如何構造一個優良的散列函數是一件非常重要的事情,我們構造散列函數的基本原則就是:盡可能的減少沖突,盡可能的將元素“散列”在存儲空間中
下面是我從維基上找到的一些方法,之后如果有好的想法再做補充:
- 直接定址法:取關鍵字或關鍵字的某個線性函數值為散列地址。即
或
,其中
為常數(這種散列函數叫做自身函數)
- 數字分析法:假設關鍵字是以x為基的數,並且哈希表中可能出現的關鍵字都是事先知道的,則可取關鍵字的若干數位組成哈希地址。
- 平方取中法:取關鍵字平方后的中間幾位為哈希地址。通常在選定哈希函數時不一定能知道關鍵字的全部情況,取其中的哪幾位也不一定合適,而一個數平方后的中間幾位數和數的每一位都相關,由此使隨機分布的關鍵字得到的哈希地址也是隨機的。取的位數由表長決定。
- 折疊法:將關鍵字分割成位數相同的幾部分(最后一部分的位數可以不同),然后取這幾部分的疊加和(舍去進位)作為哈希地址。
- 隨機數法
- 除留余數法:取關鍵字被某個不大於散列表表長m的數p除后所得的余數為散列地址。即
,
。不僅可以對關鍵字直接取模,也可在折疊法、平方取中法等運算之后取模。對p的選擇很重要,一般取素數或m,若p選擇不好,容易產生碰撞。
而在反觀Java中的散列函數:
代碼段-3
1 /** 2 * A randomizing value associated with this instance that is applied to 3 * hash code of keys to make hash collisions harder to find. 4 */ 5 transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this); 6 7 /** 8 * Retrieve object hash code and applies a supplemental hash function to the 9 * result hash, which defends against poor quality hash functions. This is 10 * critical because HashMap uses power-of-two length hash tables, that 11 * otherwise encounter collisions for hashCodes that do not differ 12 * in lower bits. Note: Null keys always map to hash 0, thus index 0. 13 */ 14 final int hash(Object k) { 15 int h = 0; 16 if (useAltHashing) { 17 if (k instanceof String) { 18 return sun.misc.Hashing.stringHash32((String) k); 19 } 20 h = hashSeed; 21 } 22 23 h ^= k.hashCode(); 24 25 // This function ensures that hashCodes that differ only by 26 // constant multiples at each bit position have a bounded 27 // number of collisions (approximately 8 at default load factor). 28 h ^= (h >>> 20) ^ (h >>> 12); 29 return h ^ (h >>> 7) ^ (h >>> 4); 30 }
- Java會利用隨機數法產生一個hashSeed
- 利用這個隨機數再與key的hashcode進行異或運算
- 然后通過各種移位異或來算出一個哈希值(這里搞不清楚什么意思,看下別的書,以后補充吧)
似乎Java是綜合運用了上面幾種方法來計算哈希值
上面有些地方是自己的一些理解,如果碰巧某位仁兄看到那里說的不對了還請指正~