面試題： hashset如何保證值不會被重復的

本文轉載自查看原文 2020-02-08 15:48 1667 集合/ hashset/ 擴容/ 值不重復

個人博客網：https://wushaopei.github.io/ (你想要這里多有)

眾所周知，HashSet 的值是不可能被重復的，在業務上經常被用來做數據去重的操作，那么，其內部究竟是怎么保證元素不重復的呢？

這里將對HashSet 的源碼進行逐步的解析：

當我們對一個HashSet 的實例添加一個值時，使用到的是它的 add 方法，源碼如下：

218    public boolean add(E e) {
219        return map.put(e, PRESENT)==null;
220    }

由以上的add 方法內的實現可知，其維護了一個 HashMap 來實現元素的添加；眾所周知，HashMap 作為雙列集合，它的鍵是不能夠重復的，這里的 PRESENT 是作為占位符的存在，與值重復判斷與否沒有意義，不作贅述。

其實，到了這里，我們已經可以知道 HashSet 的值作為 HashMap 中的 key（鍵）的，可以確定是不會存在重復值存在的情況發生。

但是，我們要了解的是為什么不會重復，繼續深究，這里繼續了解對該值的一個不可重復的原因.

以下是HashSet 引用HashMap的具體位置。

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); } /** * Constructs a new set containing the elements in the specified * collection. The <tt>HashMap</tt> is created with default load factor * (0.75) and an initial capacity sufficient to contain the elements in * the specified collection. * * @param c the collection whose elements are to be placed into this set * @throws NullPointerException if the specified collection is null */ public HashSet(Collection<? extends E> c) { map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)); addAll(c); }

HashSet 其實是在構造器內實例化了一個 HashMap 對象，那么可以得知，HashSet 的值不可重復是依賴於 HashMap 的底層對值不可重復的依賴。

其實

以下我們進入到 HashMap 的 put()方法中去：

    /** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }

由 put 方法的實現可知，該 put 方法對傳入的 Map 的key - value 進行了更深一層的 putVal（）的處理。

但這個方法不是我們現在需要了解的，稍后再對這里進行了解。

進入 putVal() 方法之前，對傳入的 key 進行了hash 運算，獲取了一個 hash 值：

    static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

獲取hash 值的規則是：當 key == null 時，hash 值為0；若不為null，則說明值有效，會通過 hashCode()

關於 hashCode() 在這里的作用：

hashCode在上面扮演的角色為尋域（尋找某個對象在集合中區域位置）。hashCode可以將集合分成若干個區域，每個對象都可以計算出他們的hash碼，可以將hash碼分組，每個分組對應着某個存儲區域，根據一個對象的hash碼就可以確定該對象所存儲區域，這樣就大大減少查詢匹配元素的數量，提高了查詢效率。

關於Key 不能夠重復，這里可以得出了，相同的值得到的 hash 碼大概率上是相同的，所以，key 可以保證不會重復，因為重復的值，一定會被覆蓋。具體從后面的源碼繼續看：

  /** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }

注意以上方法中的這一句：

   if (e.hash == hash & ((k = e.key) == key || (key != null && key.equals(k)))) break;

由此可知，HashMap 的集合元素 key-value 添加時，這里調用了對象的hashCode和equals方法進行的判斷。

但要注意，原生的 hashCode 和 equals 更多的是在引用的位置上進行了去重校驗，如要對具體的值或對象本身進行去重，還需進行重寫操作。

所以又得出一個結論：若要將對象存放到HashSet中並保證對象不重復，應根據實際情況將對象的hashCode方法和equals方法進行重寫

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java面試題之HashSet 的實現原理？面試官：HashSet如何保證元素不重復？ Vue面試經常會被問到的面試題 Java面試題從源碼角度分析HashSet實現原理？面試題：HashSet、TreeSet 和HashMap 的實現與原理面試題：如何保證消息不丟失？處理重復消息？消息有序性？消息堆積處理？ HashSet怎樣保證元素不重復高級java面試題：kafka如何保證消費順序 java面試中經常會被問到分布式面試題面試大廠，90%會被問到的Java面試題（附答案）