前言
在JDK1.8源碼分析【集合】HashMap文章中,我們分析了HashMap在JDK1.8中新增的特性(引進了紅黑樹數據結構),但是為什么要進行這個優化呢?這篇文章我們通過對比JDK1.7和1.8來分析優化的原因。
眾所周知,HashMap底層是基於 數組 + 鏈表 的方式實現的,不過在JDK1.7和1.8中具體實現稍有不同。
目錄
一、對比分析
1. 1.7版本
2. 1.8版本
總結
一、對比分析
1. 1.7版本
1.7 中的數據結構圖:
先來看看1.7中幾個比較核心的成員變量:
/** * The default initial capacity - MUST be a power of two. * 初始桶大小,因為底層是數組,所以這是數組的大小 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * 桶最大值 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. * 默認的負載因子 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * An empty table instance to share when the table is not inflated. */ static final Entry<?,?>[] EMPTY_TABLE = {}; /** * The table, resized as necessary. Length MUST Always be a power of two. * 真正存放數據的數組 */ transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; /** * The number of key-value mappings contained in this map. * Map存放數量的大小 */ transient int size; /** * The next size value at which to resize (capacity * load factor). * 桶大小,可在初始化時顯式指定 * @serial */ // If table == EMPTY_TABLE then this is the initial capacity at which the // table will be created when inflated. int threshold; /** * The load factor for the hash table. * 負載因子,可在初始化時顯式指定 * * @serial */ final float loadFactor;
這幾個成員變量中,比較有意思的是負載因子。由於給定的HashMap的容量大小是固定的,比如默認初始化:
/** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). */ public HashMap() { this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; threshold = initialCapacity; init(); }
給定的默認容量為 16,負載因子為 0.75。Map 在使用過程中不斷的往里面存放數據,當數量達到了 16 * 0.75 = 12 就需要將當前 16 的容量進行擴容,而擴容這個過程涉及到 rehash、復制數據等操作,所以非常消耗性能。因此通常建議能提前預估 HashMap 的大小最好,盡量的減少擴容帶來的性能損耗。
根據代碼可以看到真正存放數據的是:
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
這個數組,接下來看看它是如何實現的:
static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; int hash; /** * Creates new entry. */ Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; } public final K getKey() { return key; } public final V getValue() { return value; } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (!(o instanceof Map.Entry)) return false; Map.Entry e = (Map.Entry)o; Object k1 = getKey(); Object k2 = e.getKey(); if (k1 == k2 || (k1 != null && k1.equals(k2))) { Object v1 = getValue(); Object v2 = e.getValue(); if (v1 == v2 || (v1 != null && v1.equals(v2))) return true; } return false; } public final int hashCode() { return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue()); } public final String toString() { return getKey() + "=" + getValue(); } /** * This method is invoked whenever the value in an entry is * overwritten by an invocation of put(k,v) for a key k that's already * in the HashMap. */ void recordAccess(HashMap<K,V> m) { } /** * This method is invoked whenever the entry is * removed from the table. */ void recordRemoval(HashMap<K,V> m) { } }
Entry 是 HashMap 中的一個內部類,從他的成員變量很容易看出:
- key是寫入的鍵;
- value是key對應的值;
- next用於實現鏈表結構,指向下一個鏈表節點;
- hash存放的是當前key的hashCode。
知曉了基本結構,再來看看put、get函數:
put函數
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { // 判斷當前數組是否需要初始化 if (table == EMPTY_TABLE) { inflateTable(threshold); } // 如果 key 為空,則 put 一個空值進去 if (key == null) return putForNullKey(value); // 根據 key 計算出 hashcode int hash = hash(key); // 根據計算出的 hashcode 定位出所在桶 int i = indexFor(hash, table.length); // 如果桶是一個鏈表則需要遍歷判斷里面的 hashcode、key 是否和傳入 key 相等,如果相等則進行覆蓋,並返回原來的值 for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; // 如果桶是空的,說明當前位置沒有數據存入;新增一個 Entry 對象寫入當前位置 addEntry(hash, key, value, i); return null; }
/** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if appropriate. * * Subclass overrides this to alter the behavior of put method. */ void addEntry(int hash, K key, V value, int bucketIndex) { // 判斷是否需要擴容 if ((size >= threshold) && (null != table[bucketIndex])) { // 如果需要就進行兩倍擴充,並將當前的 key 重新 hash 並定位 resize(2 * table.length); hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } // 將當前位置的桶傳入到新建的桶中,如果當前桶有值就會在位置形成鏈表 createEntry(hash, key, value, bucketIndex); } /** * Like addEntry except that this version is used when creating entries * as part of Map construction or "pseudo-construction" (cloning, * deserialization). This version needn't worry about resizing the table. * * Subclass overrides this to alter the behavior of HashMap(Map), * clone, and readObject. */ void createEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); size++; }
get函數
再來看看get函數:
/** * Returns the value to which the specified key is mapped, * or {@code null} if this map contains no mapping for the key. * * <p>More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * * <p>A return value of {@code null} does not <i>necessarily</i> * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * * @see #put(Object, Object) */ public V get(Object key) { if (key == null) return getForNullKey(); Entry<K,V> entry = getEntry(key); return null == entry ? null : entry.getValue(); } /** * Returns the entry associated with the specified key in the * HashMap. Returns null if the HashMap contains no mapping * for the key. */ final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } // 根據 key 計算出 hashcode,然后定位到具體的桶中 int hash = (key == null) ? 0 : hash(key); // 判斷該位置是否為鏈表 for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; // 根據 key、key 的 hashcode 是否相等來返回值 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } // 啥都沒取到就直接返回 null return null; }
2. 1.8版本
不知道通過1.7的實現大家看出需要優化的點沒有?
其中一個很明顯的地方就是:當 Hash 沖突嚴重時,在桶上形成的鏈表會變的越來越長,這樣在查詢時的效率就會越來越低;時間復雜度為O(N)。
因此 1.8 中重點優化了這個查詢效率。
1.8 中的數據結構圖:
還是一樣,先來看看幾個核心的成員變量:
/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. * 用於判斷是否需要將鏈表轉換為紅黑樹的閾值 */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */ static final int MIN_TREEIFY_CAPACITY = 64; /** * JDK1.7是HashEntry,1.8修改為Node */ transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */ transient int modCount; /** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY.) int threshold; /** * The load factor for the hash table. * * @serial */ final float loadFactor;
Node 的核心組成其實也是和 1.7 中的 HashEntry 一樣,存放的都是key、value、hashCode、next 等數據。
再來看看存取數據的put、get函數。
put函數
/** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // 判斷當前桶是否為空,空的就需要初始化(resize 中會判斷是否進行初始化) if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // 根據當前 key 的 hashcode 定位到具體的桶中並判斷是否為空,為空表明沒有 Hash 沖突就直接在當前位置創建一個新桶即可 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && // 如果當前桶有值( Hash 沖突),那么就要比較當前桶中的 key、key 的 hashcode 與寫入的 key 是否相等,相等就賦值給 e ((k = p.key) == key || (key != null && key.equals(k)))) e = p; // 如果當前桶為紅黑樹,那就要按照紅黑樹的方式寫入數據 else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { // 如果是個鏈表,就需要將當前的 key、value 封裝成一個新節點寫入到當前桶的后面(形成鏈表) for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // 判斷當前鏈表的大小是否大於預設的閾值,大於時就要轉換為紅黑樹 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } // 如果在遍歷過程中找到 key 相同時直接退出遍歷 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } // 如果 e != null 就相當於存在相同的 key,那就需要將值覆蓋 if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; // 判斷是否需要進行擴容 if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
get函數
/** * Returns the value to which the specified key is mapped, * or {@code null} if this map contains no mapping for the key. * * <p>More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * * <p>A return value of {@code null} does not <i>necessarily</i> * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * * @see #put(Object, Object) */ public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } /** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // 將 key hash 之后取得所定位的桶 if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { // 判斷桶的第一個位置(有可能是鏈表、紅黑樹)的 key 是否為查詢的 key,是就直接返回 value if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; // 如果第一個不匹配,則判斷它的下一個是紅黑樹還是鏈表 if ((e = first.next) != null) { if (first instanceof TreeNode) // 紅黑樹就按照樹的查找方式返回值 return ((TreeNode<K,V>)first).getTreeNode(hash, key); // 不然就按照鏈表的方式遍歷匹配返回值 do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
從這兩個核心方法(get/put)可以看出 1.8 中對大鏈表做了優化,修改為紅黑樹之后查詢效率直接提高到了O(logn)。
但是 HashMap 原有的問題也都存在,比如在並發場景下使用時容易出現死循環。
final HashMap<String, String> map = new HashMap<String, String>(); for (int i = 0; i < 1000; i++) { new Thread(new Runnable() { @Override public void run() { map.put(UUID.randomUUID().toString(), ""); } }).start(); }
但是為什么呢?看過上文的還記得在 HashMap 擴容的時候會調用resize() 方法,就是這里的並發操作容易在一個桶上形成環形鏈表;這樣當獲取一個不存在的 key 時,計算出的 index 正好是環形鏈表的下標就會出現死循環。下一篇將詳細介紹HashMap死循環的原因。
還有一個值得注意的是 HashMap 的遍歷方式,通常有以下幾種:
Iterator<Map.Entry<String, Integer>> entryIterator = map.entrySet().iterator(); while (entryIterator.hasNext()) { Map.Entry<String, Integer> next = entryIterator.next(); System.out.println("key=" + next.getKey() + " value=" + next.getValue()); } Iterator<String> iterator = map.keySet().iterator(); while (iterator.hasNext()){ String key = iterator.next(); System.out.println("key=" + key + " value=" + map.get(key)); }
強烈建議使用第一種 EntrySet 進行遍歷。第一種可以把 key value 同時取出,第二種還得需要通過 key 取一次 value,效率較低。
總結
HashMap無論是 1.7 還是 1.8 其實都能看出 JDK 沒有對它做任何的同步操作,所以並發會出問題,甚至出現死循環導致系統不可用。因此 JDK 推出了專項專用的 ConcurrentHashMap ,該類位於java.util.concurrent 包下,專門用於解決並發問題。