日常工作中,被同事突然问到的一个问题,hashmap是我们JAVA程序中使用频率非常高的key-value键值对形式的数据类型
结论是目前能触发转化的两个条件是:一个是链表的长度达到8个,一个是数组的长度达到64个
为什么要触发这个转换,目前官方的解释:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-pow(0.5, k) / factorial(k)). The first values are:
0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million
翻译过来大概的意思是:理想情况下使用随机的哈希码,容器中节点分布在hash桶中的频率遵循泊松分布(具体可以查看http://en.wikipedia.org/wiki/Poisson_distribution),按照泊松分布的计算公式计算出了桶中元素个数和概率的对照表,可以看到链表中元素个数为8时的概率已经非常小,再多的就更少了,所以原作者在选择链表元素个数时选择了8,是根据概率统计而选择的
JDK1.8中对hashmap底层对实现进行了优化,引入了红黑树对数据结构和扩容优化等,先看下源码:
/** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; //如果当前的bucket里面已经是红黑树的话,执行红黑树的添加操作 else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // TREEIFY_THRESHOLD = 8,判断如果当前bucket的位置链表长度大于8的话就将此链表变成红黑树 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
这个方法执行对操作是,先通过hash计算要添加的key准备插入的槽位,如果key是一样的,则根据设置的参数是否执行覆盖,如果相应的槽位是空的话直接插入,如果对应的槽位有值则判断是红黑树结构还是链表结构,
链表的话则顺着链表寻找,如果找到一样的key,则根据参数选择覆盖,没有找到则链接到链表最后面,链表项的数目大于8则对其进行树化,如果是红黑树结构则按照树的添加方式进行添加操作
进行数化对添加操作是通过treeifyBin方法,我们来看看这个方法:
/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; //MIN_TREEIFY_CAPACITY = 64,这里是重点,如果table小于64,那么是走的扩容resize的方法,超过这个数字,才会走到else的TreeNode的构建 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) //自动扩容,后续专门一篇文章介绍这个 resize(); // 通过hash求出bucket的位置。 else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; do { // 将每个节点包装成TreeNode TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { // 将所有TreeNode连接在一起此时只是链表结构 p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } } 这个方法的操作是将目前的9个项通过链表的方式链接在一起,以他为基础,构建红黑树 看下TreeNode类的源码,发现对红黑树的操作都是在TreeNode内部 static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root, static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root, static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root, static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root, /** * Forms tree of the nodes linked from this node. */ final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; //遍历传入的链表 for (TreeNode<K,V> x = this, next; x != null; x = next) { next = (TreeNode<K,V>)x.next; x.left = x.right = null; //为根结点赋值 if (root == null) { x.parent = null; x.red = false; root = x; } else { //x是当前访问链表中的项 K k = x.key; int h = x.hash; Class<?> kc = null; //此时红黑树已经有了根节点,上面获取了当前加入红黑树的项的key和hash值进入核心循环,从root开始是一个自顶向下的方式遍历添加 //for循环没有控制条件,由代码内部break跳出循环 for (TreeNode<K,V> p = root;;) { //dir:directory,ph:parent hash int dir, ph; K pk = p.key; //比较当前项与当前树中访问节点hash值判断加入项的路径,-1为左子树,+1为右子树 if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) dir = tieBreakOrder(k, pk); //xp:x parent,找到符合x添加条件的节点 TreeNode<K,V> xp = p; if ((p = (dir <= 0) ? p.left : p.right) == null) { x.parent = xp; //如果xp的hash值大于x的hash值,将x添加在xp的左边,否则,添加在xp的右边 if (dir <= 0) xp.left = x; else xp.right = x; //添加节点后,维护添加后红黑树的红黑结构 root = balanceInsertion(root, x); //跳出循环,代表当前链表中的项成功的添加到了红黑树中 break; } } } } moveRootToFront(tab, root); }
整个方法的大体的执行是第一次循环会将链表中的第一个节点作为红黑树的根,后面的循环会通过比较链表中的项的hash值,放到树节点的左边或者右边,因为添加操作可能会破坏树的结构,所以最后会做一次balanceInsertion
这个方法里面将会进行旋转和颜色变换,具体的原理就是依据红黑树的规则。