ThreadLocal 和神奇的數字 0x61c88647

本文轉載自查看原文 2014-12-01 15:59 5842

這篇文章會詳細闡述ThreadLocal的內部結構及其原理，以及神奇的0x61c88647

在Java 1.4之前，ThreadLocals會產生線程間的競爭，無法寫出高性能的代碼. Java 1.5改變了它的實現，下面詳細闡述ThreadLocal 的內部結構和原理，並分析為了解決散列表的沖突而引入的神奇的hash code: 0x61c88647

1 `ThreadLocal` 應用場景

先舉個在平時工作中經常用到的場景, 一個web應用供登錄用戶通過瀏覽器訪問，后台應用會獲取用戶的登錄信息(如用戶名)，並對每個用戶的訪問做記錄. 這是一個並發場景，每次請求都分配一個線程去處理這個請求，web容器一般都會有一個線程池，每次請求都會分配其中的一個空閑線程去處理用戶的這次請求, 處理完畢后，線程歸還線程池等待后續訪問的線程分配.

當然，用戶登錄信息可以從當前請求request中獲取，但是后台應用的多個地方可能都會需要用戶登錄信息，一個解決辦法是向這些所有用到的地方傳遞request參數，顯然是麻煩的。另外一個辦法就是利用ThreadLocal, 獲取登錄信息后把它放到當前線程中的ThradLocal變量中，任何需要的時候從當前線程中取就可以了，是不是很方便呢?

因此ThreadLocal的應用場景應該是實現在不同的線程存儲不同的上下文信息的場合中，這樣的場合最多的可能就是webapp中，引用stackoverflow中的一個回答:

ThreadLocal is most commonly used to implement per-request global variables in app servers or servlet containers where each user/client request is handled by a separate thread.

2 `ThreadLocal` 原理

Java docs api說：

This class provides thread-local variables. These variables differ from their normal counterparts in that each thread that accesses one (via its get or set method) has its own, independently initialized copy of the variable.ThreadLocal instances are typically private static fields in classes that wish to associate state with a thread.

因此，ThreadLocal只是提供一個thread-local變量，這個變量於當前線程所獨有, 每一個線程都有一個隸屬與當前線程的thread-local變量

下面是ThreadLocal對外提供的四個方法:

threadlocal_methods

protected T initialValue() 設置並返回當前線程變量的一個初始值
set(T value) 將信息value放到當前線程的thread-local變量中
T get() 是獲取set(T value)設置的值,如果沒有則返回初始值
remove() 移除線程中的這個thread-local變量

thread-local變量是怎么與當前線程Thread關聯的呢? 看一下Thread源碼，它有一個實例屬性:

/**
 * ThreadLocal values pertaining to this thread.  * This map is maintained by the ThreadLocal class.  */ ThreadLocal.ThreadLocalMap threadLocals = null;

是的，就是ThreadLocal.ThreadLocalMap對象(Thread和ThreadLocal類屬於相同的包java.lang). 看來它是用ThreadLocalMap實現的，此時能看出ThreadLocalMap是ThreadLocal類中的一個靜態內部類, 也可以看出上面說的thread-local變量其實就是這個threadLocals對象, 下面就看下這個ThreadLocalMap到底長什么樣

static class ThreadLocalMap { static class Entry extends WeakReference<ThreadLocal> { /** The value associated with this ThreadLocal. */ Object value; Entry(ThreadLocal k, Object v) { super(k); value = v; } } /**  * The initial capacity -- MUST be a power of two.  */ private static final int INITIAL_CAPACITY = 16; /**  * The table, resized as necessary.  * table.length MUST always be a power of two.  */ private Entry[] table; /**  * Get the entry associated with key.  */ private Entry getEntry(ThreadLocal key) {...} /**  * Set the value associated with key.  */ private void set(ThreadLocal key, Object value) {...} // æé å1⁄2æ°åå¶ä»ä ̧äoå1⁄2æ°çç¥ }

可以看出ThreadLocalMap確實是一個map, 通過它的屬性Entry[] table實現，而Entry的key是ThreadLocal對象，value是要設置的值，

注意兩點:

具體的ThreadLocalMap實例並不是ThreadLocal保持，而是每個Thread持有，且不同的Thread持有不同的ThreadLocalMap實例, 因此它們是不存在線程競爭的(不是一個全局的map)，另一個好處是每次線程死亡，所有map中引用到的對象都會隨着這個Thread的死亡而被垃圾收集器一起收集
Entry的key是對ThreadLocal的弱引用，當拋棄掉ThreadLocal對象時，垃圾收集器會忽略這個key的引用而清理掉ThreadLocal對象，防止了內存泄漏

總上所述，可以用下面的結構圖描述ThreadLocal的工作原理：

threadlocal<em>structure</em>principle

當向thread-local變量中設置value時(set(T value))，獲取當前Thread中的ThreadLocalMap,如果此時是null，則用ThreadLocal實例和value構建一個map設置到當前線程的屬性threadLocals中，否則通過ThreadLocal對象作為key直接將ThreadLocal實例和value放到當前Thread已存在的map中(可能產生沖突，后面介紹)

當從ThreadLocal變量中獲取value時(get()), 獲取當前Thread中的ThreadLocalMap, 如果為null則通過initialValue()構建初始值同時利用這個初始值構建一個map到當前Thread中，最后返回這個初始值，否則從map中獲取對應的Entry並返回value

通過原理分析可以看出，在使用ThreadLocal是應該將它聲明為public static, 即所有線程共用一個ThreadLocal實例，而不是每一個線程來臨時都要新創建一個ThreadLocal對象， Java Doc也建議，ThreadLocal應當聲明為public static.

3 碰撞解決與神奇的 `0x61c88647`

既然ThreadLocal用map就避免不了沖突的產生

3.1 碰撞避免和解決

這里碰撞其實有兩種類型

只有一個ThreadLocal實例的時候(上面推薦的做法)，當向thread-local變量中設置多個值的時產生的碰撞，碰撞解決是通過開放定址法，且是線性探測(linear-probe)
多個ThreadLocal實例的時候，最極端的是每個線程都new一個ThreadLocal實例，此時利用特殊的哈希碼0x61c88647大大降低碰撞的幾率，同時利用開放定址法處理碰撞

3.2 神奇的 `0x61c88647`

注意 0x61c88647 的利用主要是為了多個ThreadLocal實例的情況下用的

從ThreadLocal源碼中找出這個哈希碼所在的地方

/**
 * ThreadLocals rely on per-thread linear-probe hash maps attached  * to each thread (Thread.threadLocals and inheritableThreadLocals).  * The ThreadLocal objects act as keys, searched via threadLocalHashCode.  * This is a custom hash code (useful only within ThreadLocalMaps) that  * eliminates collisions in the common case where consecutively  * constructed ThreadLocals are used by the same threads,  * while remaining well-behaved in less common cases.  */ private final int threadLocalHashCode = nextHashCode(); /**  * The next hash code to be given out. Updated atomically.  * Starts at zero.  */ private static AtomicInteger nextHashCode = new AtomicInteger(); /**  * The difference between successively generated hash codes - turns  * implicit sequential thread-local IDs into near-optimally spread  * multiplicative hash values for power-of-two-sized tables.  */ private static final int HASH_INCREMENT = 0x61c88647; /**  * Returns the next hash code.  */ private static int nextHashCode() { return nextHashCode.getAndAdd(HASH_INCREMENT); }

注意實例變量threadLocalHashCode, 每當創建ThreadLocal實例時這個值都會累加 0x61c88647, 目的在上面的注釋中已經寫的很清楚了：為了讓哈希碼能均勻的分布在2的N次方的數組里, 即 Entry[] table

下面來看一下ThreadLocal怎么使用的這個 threadLocalHashCode 哈希碼的，下面是ThreadLocalMap靜態內部類中的set方法的部分代碼：

// Set the value associated with key.
private void set(ThreadLocal key, Object value) { Entry[] tab = table; int len = tab.length; int i = key.threadLocalHashCode & (len-1); for (Entry e = tab[i]; e != null; e = tab[i = nextIndex(i, len)]) {...} ...

key.threadLocalHashCode & (len-1)這么用是什么意思? 先看一下table數組的長度吧：

/**
 * The table, resized as necessary.  * table.length MUST always be a power of two.  */ private Entry[] table;

哇，ThreadLocalMap 中 Entry[] table 的大小必須是2的N次方呀(len = 2^N)，那 len-1 的二進制表示就是低位連續的N個1，那 key.threadLocalHashCode & (len-1) 的值就是 threadLocalHashCode 的低N位, 這樣就能均勻的產生均勻的分布? 我用python做個實驗吧

>>> HASH_INCREMENT = 0x61c88647 >>> def magic_hash(n): ... for i in range(n): ... nextHashCode = i * HASH_INCREMENT + HASH_INCREMENT ... print nextHashCode & (n - 1), ... print ... >>> magic_hash(16) 7 14 5 12 3 10 1 8 15 6 13 4 11 2 9 0 >>> magic_hash(32) 7 14 21 28 3 10 17 24 31 6 13 20 27 2 9 16 23 30 5 12 19 26 1 8 15 22 29 4 11 18 25 0

產生的哈希碼分布真的是很均勻，而且沒有任何沖突啊, 太神奇了, javaspecialists中的一篇文章有對它的一些描述：

This number represents the golden ratio (sqrt(5)-1) times two to the power of 31 ((sqrt(5)-1) * (2^31)). The result is then a golden number, either 2654435769 or -1640531527.

以及

We established thus that the HASH_INCREMENT has something to do with fibonacci hashing, using the golden ratio. If we look carefully at the way that hashing is done in the ThreadLocalMap, we see why this is necessary. The standard java.util.HashMap uses linked lists to resolve clashes. The ThreadLocalMapsimply looks for the next available space and inserts the element there. It finds the first space by bit masking, thus only the lower few bits are significant. If the first space is full, it simply puts the element in the next available space. The HASH_INCREMENT spaces the keys out in the sparce hash table, so that the possibility of finding a value next to ours is reduced.

這與fibonacci hashing(斐波那契散列法)以及黃金分割有關，具體可研究中的 6.4 節Hashing部分