Java Map hashCode深究


【Java心得總結七】Java容器下——Map 在自己總結的這篇文章中有提到hashCode,但是沒有細究,今天細究整理一下hashCode相關問題

1.hashCode與equals

  首先我們都知道hashCode()和equals()函數是java基類Object的一部分,我查閱了java7文檔,其中對於兩者的描述如下:

  解讀這里對hashCode的描述,不難發現:

  • 首先hashCode必須是一個整數,即Integer類型的
  • 其次滿足一致性,即在程序的同一次執行無論調用該函數多少次都返回相同的整數。(這里注意是程序的一次執行,而程序不同的執行間是不保證返回相同結果,因為hashcode計算方式可能會涉及到物理地址,而程序的不同執行對象在內存的位置會不同)
  • 另外與equas配合,如果兩個對象調用equals相同那么一定擁有相同的hashcode,然而反之,如果兩個對象調用equals不相等,hashcode不一定就不同(但是這里提到盡量產生不同的hashcode有利於提高哈希表的性能,減少了沖突嘛)

  

  這里突然發現《java編程思想》中對於equals的描述原來出自這里:

  • 自反性:對任意x,x.equals(x)一定返回true
  • 對稱性:對任意x,y如果x.equals(y)返回true,則y.equals(y)返回true
  • 傳遞性:對任意x,y,z如果x.equals(y)和y.equals(z)都返回true,則x.equals(z)返回true
  • 一致性:對任意x,y,equals函數返回的結果無論調用多少次都一致
  • 另外還有就是任意x,x.equals(null)都會返回false
  • 還需要注意的就是一旦equals函數被override,那么hashcode也一定要override以保持前面的原則

2.Map對hashCode的應用

  Java中HashMap的實現,我截取了部分代碼如下:

代碼段-1

  1 /* HashMap實現部分代碼 */
  2 public class HashMap<K,V>
  3     extends AbstractMap<K,V>
  4     implements Map<K,V>, Cloneable, Serializable
  5 {
  6     /**
  7      * The default initial capacity - MUST be a power of two.
  8      */
  9     static final int DEFAULT_INITIAL_CAPACITY = 16;
 10 
 11     /**
 12      * The maximum capacity, used if a higher value is implicitly specified
 13      * by either of the constructors with arguments.
 14      * MUST be a power of two <= 1<<30.
 15      */
 16     static final int MAXIMUM_CAPACITY = 1 << 30;
 17 
 18     /**
 19      * The load factor used when none specified in constructor.
 20      */
 21     static final float DEFAULT_LOAD_FACTOR = 0.75f;
 22 
 23     /**
 24      * The table, resized as necessary. Length MUST Always be a power of two.
 25      */
 26     transient Entry<K,V>[] table;
 27 
 28     /**
 29      * The number of key-value mappings contained in this map.
 30      */
 31     transient int size;
 32 
 33     /**
 34      * The next size value at which to resize (capacity * load factor).
 35      * @serial
 36      */
 37     int threshold;
 38 
 39     /**
 40      * The load factor for the hash table.
 41      *
 42      * @serial
 43      */
 44     final float loadFactor;
 45 
 46     /**
 47      * Retrieve object hash code and applies a supplemental hash function to the
 48      * result hash, which defends against poor quality hash functions.  This is
 49      * critical because HashMap uses power-of-two length hash tables, that
 50      * otherwise encounter collisions for hashCodes that do not differ
 51      * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 52      */
 53     final int hash(Object k) {
 54         int h = 0;
 55         if (useAltHashing) {
 56             if (k instanceof String) {
 57                 return sun.misc.Hashing.stringHash32((String) k);
 58             }
 59             h = hashSeed;
 60         }
 61 
 62         h ^= k.hashCode();
 63 
 64         // This function ensures that hashCodes that differ only by
 65         // constant multiples at each bit position have a bounded
 66         // number of collisions (approximately 8 at default load factor).
 67         h ^= (h >>> 20) ^ (h >>> 12);
 68         return h ^ (h >>> 7) ^ (h >>> 4);
 69     }
 70     
 71     /**
 72      * Returns index for hash code h.
 73      */
 74     static int indexFor(int h, int length) {
 75         return h & (length-1);
 76     }
 77 
 78     /**
 79      * Adds a new entry with the specified key, value and hash code to
 80      * the specified bucket.  It is the responsibility of this
 81      * method to resize the table if appropriate.
 82      *
 83      * Subclass overrides this to alter the behavior of put method.
 84      */
 85     void addEntry(int hash, K key, V value, int bucketIndex) {
 86         if ((size >= threshold) && (null != table[bucketIndex])) {
 87             resize(2 * table.length);
 88             hash = (null != key) ? hash(key) : 0;
 89             bucketIndex = indexFor(hash, table.length);
 90         }
 91 
 92         createEntry(hash, key, value, bucketIndex);
 93     }
 94     
 95     /**
 96      * Like addEntry except that this version is used when creating entries
 97      * as part of Map construction or "pseudo-construction" (cloning,
 98      * deserialization).  This version needn't worry about resizing the table.
 99      *
100      * Subclass overrides this to alter the behavior of HashMap(Map),
101      * clone, and readObject.
102      */
103     void createEntry(int hash, K key, V value, int bucketIndex) {
104         Entry<K,V> e = table[bucketIndex];
105         table[bucketIndex] = new Entry<>(hash, key, value, e);
106         size++;
107     }
108     
109     /**
110      * Associates the specified value with the specified key in this map.
111      * If the map previously contained a mapping for the key, the old
112      * value is replaced.
113      *
114      * @param key key with which the specified value is to be associated
115      * @param value value to be associated with the specified key
116      * @return the previous value associated with <tt>key</tt>, or
117      *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
118      *         (A <tt>null</tt> return can also indicate that the map
119      *         previously associated <tt>null</tt> with <tt>key</tt>.)
120      */
121     public V put(K key, V value) {
122         if (key == null)
123             return putForNullKey(value);
124         int hash = hash(key);
125         int i = indexFor(hash, table.length);
126         for (Entry<K,V> e = table[i]; e != null; e = e.next) {
127             Object k;
128             if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
129                 V oldValue = e.value;
130                 e.value = value;
131                 e.recordAccess(this);
132                 return oldValue;
133             }
134         }
135 
136         modCount++;
137         addEntry(hash, key, value, i);
138         return null;
139     }
140     
141     /**
142      * Returns the entry associated with the specified key in the
143      * HashMap.  Returns null if the HashMap contains no mapping
144      * for the key.
145      */
146     final Entry<K,V> getEntry(Object key) {
147         int hash = (key == null) ? 0 : hash(key);
148         for (Entry<K,V> e = table[indexFor(hash, table.length)];
149              e != null;
150              e = e.next) {
151             Object k;
152             if (e.hash == hash &&
153                 ((k = e.key) == key || (key != null && key.equals(k))))
154                 return e;
155         }
156         return null;
157     }
158     
159     /**
160      * Removes and returns the entry associated with the specified key
161      * in the HashMap.  Returns null if the HashMap contains no mapping
162      * for this key.
163      */
164     final Entry<K,V> removeEntryForKey(Object key) {
165         int hash = (key == null) ? 0 : hash(key);
166         int i = indexFor(hash, table.length);
167         Entry<K,V> prev = table[i];
168         Entry<K,V> e = prev;
169 
170         while (e != null) {
171             Entry<K,V> next = e.next;
172             Object k;
173             if (e.hash == hash &&
174                 ((k = e.key) == key || (key != null && key.equals(k)))) {
175                 modCount++;
176                 size--;
177                 if (prev == e)
178                     table[i] = next;
179                 else
180                     prev.next = next;
181                 e.recordRemoval(this);
182                 return e;
183             }
184             prev = e;
185             e = next;
186         }
187 
188         return e;
189     }
190     
191     /**
192      * Rehashes the contents of this map into a new array with a
193      * larger capacity.  This method is called automatically when the
194      * number of keys in this map reaches its threshold.
195      *
196      * If current capacity is MAXIMUM_CAPACITY, this method does not
197      * resize the map, but sets threshold to Integer.MAX_VALUE.
198      * This has the effect of preventing future calls.
199      *
200      * @param newCapacity the new capacity, MUST be a power of two;
201      *        must be greater than current capacity unless current
202      *        capacity is MAXIMUM_CAPACITY (in which case value
203      *        is irrelevant).
204      */
205     void resize(int newCapacity) {
206         Entry[] oldTable = table;
207         int oldCapacity = oldTable.length;
208         if (oldCapacity == MAXIMUM_CAPACITY) {
209             threshold = Integer.MAX_VALUE;
210             return;
211         }
212 
213         Entry[] newTable = new Entry[newCapacity];
214         boolean oldAltHashing = useAltHashing;
215         useAltHashing |= sun.misc.VM.isBooted() &&
216                 (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
217         boolean rehash = oldAltHashing ^ useAltHashing;
218         transfer(newTable, rehash);
219         table = newTable;
220         threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
221     }
222     
223     /**
224      * Transfers all entries from current table to newTable.
225      */
226     void transfer(Entry[] newTable, boolean rehash) {
227         int newCapacity = newTable.length;
228         for (Entry<K,V> e : table) {
229             while(null != e) {
230                 Entry<K,V> next = e.next;
231                 if (rehash) {
232                     e.hash = null == e.key ? 0 : hash(e.key);
233                 }
234                 int i = indexFor(e.hash, newCapacity);
235                 e.next = newTable[i];
236                 newTable[i] = e;
237                 e = next;
238             }
239         }
240     }
241 }

 代碼段-2

 1 static class Entry<K,V> implements Map.Entry<K,V> {
 2         final K key;
 3         V value;
 4         Entry<K,V> next;
 5         int hash;
 6 
 7         /**
 8          * Creates new entry.
 9          */
10         Entry(int h, K k, V v, Entry<K,V> n) {
11             value = v;
12             next = n;
13             key = k;
14             hash = h;
15         }
16 }

 

  我將HahsMap中的增刪改查以及相關用到的函數截取了出來以作分析:

  • 存儲方式:Java中的HashMap源碼是通過Entry<K,V>[]即一個Entry數組實現的,在代碼26行(前面加transient是多線程問題);
  • 散列函數:53行的hash函數中我們可以看出Java源代碼利用HashTable中的key的hashCode來計算哈希值,我們可以將這個函數看做散列函數;
  • 擴展存儲空間:在代碼85行addEntry函數中我們看到當發生空間不足或者沖突的時候,java會利用代碼205行的代碼進行擴充,擴充方法就是new一個新的Entry數組,數組大小是原有數組大小的兩倍,之后再將舊的表格中的數據全部拷貝到現有新的數組中。(注:Java在性能與空間之間做了權衡,即只有當size大於某一個閾值threshold且發生了沖突的時候才會進行存儲數組的擴充
  • 存儲位置:在代碼89行addEntry函數中,當添加一個元素時,如何確定將該Entry添加到數組的什么位置:利用了代碼74行的indexFor函數,通過利用hash函數計算的哈希值與數組長度進行與運算來獲得(保證了返回的值不會超出數組界限);
  • 沖突解決:哈希表結構不得不提的就是沖突問題,因為我們知道幾乎不可能找到一個完美的散列函數把所有數據完全分散不沖突的散列在存儲序列中(除非存儲空間足夠大),所以沖突時必不可少的,查看代碼段-2,會發現每個Entry中會有一個指針指向下一個Entry,在代碼段-1中的105行,會發現createEntry函數中會將最新插入的Entry放在table中,然后讓它指向原有的鏈表。即Java HashMap中用了最傳統的當發生沖突在后面掛鏈表的方式來解決。
  • put函數:在代碼121行我們看到我們最常用的HashMap插入元素方法put,當傳入要添加的key和value時,它會遍歷哈希表,來確定表中是否已經有key(確定兩個key是否相等就要用到equals函數,所以如果我們在利用HashMap的時候key是自定義類,那么切記要override equals函數),如果沒有則新添加,如果有則覆蓋原有key的value值
  • getEntry函數:在代碼146行getEntry函數中會再次計算出傳入key的hash值,然后還是通過代碼74行的indexFor函數計算該元素在數組中的位置,我們發現函數中並不是O(1)的方式取到的,需要用到一個循環,因為我們上面提到了沖突,如果在某點發生了沖突,那么就要通過遍歷沖突鏈表來進行查找
  • removeEntry函數:同樣涉及到一個查找的過程,而且還涉及到如果被刪除元素在沖突鏈表中需要修改前后元素的指針

 3.散列函數/哈希函數

   通過上面的分析我們也會發現如何構造一個優良的散列函數是一件非常重要的事情,我們構造散列函數的基本原則就是:盡可能的減少沖突,盡可能的將元素“散列”在存儲空間中

  下面是我從維基上找到的一些方法,之后如果有好的想法再做補充:

  1. 直接定址法:取關鍵字或關鍵字的某個線性函數值為散列地址。即hash(k)=khash(k)=a\cdot k + b,其中a\,b為常數(這種散列函數叫做自身函數)
  2. 數字分析法:假設關鍵字是以x為基的數,並且哈希表中可能出現的關鍵字都是事先知道的,則可取關鍵字的若干數位組成哈希地址。
  3. 平方取中法:取關鍵字平方后的中間幾位為哈希地址。通常在選定哈希函數時不一定能知道關鍵字的全部情況,取其中的哪幾位也不一定合適,而一個數平方后的中間幾位數和數的每一位都相關,由此使隨機分布的關鍵字得到的哈希地址也是隨機的。取的位數由表長決定。
  4. 折疊法:將關鍵字分割成位數相同的幾部分(最后一部分的位數可以不同),然后取這幾部分的疊加和(舍去進位)作為哈希地址。
  5. 隨機數法
  6. 除留余數法:取關鍵字被某個不大於散列表表長m的數p除后所得的余數為散列地址。即hash(k)=k \,\bmod \,pp\le m。不僅可以對關鍵字直接取模,也可在折疊法平方取中法等運算之后取模。對p的選擇很重要,一般取素數或m,若p選擇不好,容易產生碰撞。

 而在反觀Java中的散列函數:

代碼段-3

 1 /**
 2      * A randomizing value associated with this instance that is applied to
 3      * hash code of keys to make hash collisions harder to find.
 4      */
 5     transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
 6     
 7     /**
 8      * Retrieve object hash code and applies a supplemental hash function to the
 9      * result hash, which defends against poor quality hash functions.  This is
10      * critical because HashMap uses power-of-two length hash tables, that
11      * otherwise encounter collisions for hashCodes that do not differ
12      * in lower bits. Note: Null keys always map to hash 0, thus index 0.
13      */
14     final int hash(Object k) {
15         int h = 0;
16         if (useAltHashing) {
17             if (k instanceof String) {
18                 return sun.misc.Hashing.stringHash32((String) k);
19             }
20             h = hashSeed;
21         }
22 
23         h ^= k.hashCode();
24 
25         // This function ensures that hashCodes that differ only by
26         // constant multiples at each bit position have a bounded
27         // number of collisions (approximately 8 at default load factor).
28         h ^= (h >>> 20) ^ (h >>> 12);
29         return h ^ (h >>> 7) ^ (h >>> 4);
30     }
  1.  Java會利用隨機數法產生一個hashSeed
  2. 利用這個隨機數再與key的hashcode進行異或運算
  3. 然后通過各種移位異或來算出一個哈希值(這里搞不清楚什么意思,看下別的書,以后補充吧)

似乎Java是綜合運用了上面幾種方法來計算哈希值

 

上面有些地方是自己的一些理解,如果碰巧某位仁兄看到那里說的不對了還請指正~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM