還是說Memory Model,gcc的__sync_synchronize真是太坑爹了!
嗯,還是說可見性的問題。由於CPU和編譯器的亂序執行功能,我們經常不得不在代碼中手動插入memory barrier。如果你還不清楚memory barrier是什么,那么請先讀這個 http://en.wikipedia.org/wiki/Memory_barrier
假如你已經了解它了,那么具體怎么用呢?怎么在代碼中插入一個memory barrier ? 用哪個函數?
gcc的手冊中有一節叫做”Built-in functions for atomic memory access”,然后里面列舉了這樣一個函數:
__sync_synchronize (…)
This builtin issues a full memory barrier.
來,我們寫段代碼試下:
int main(){
__sync_synchronize();
return 0; } |
然后用gcc4.2編譯,
# gcc -S -c test.c
然后看對應的匯編代碼,
main:
pushq %rbp
movq %rsp, %rbp movl $0, %eax leave ret |
嗯?Nothing at all !!! 不信你試一試,我的編譯環境是Freebsd 9.0 release, gcc (GCC) 4.2.1 20070831 patched [FreeBSD]。
好,我換個高版本的gcc編譯器試一試,gcc46 (FreeBSD Ports Collection) 4.6.3 20120113 (prerelease)
main:
pushq %rbp
movq %rsp, %rbp mfence movl $0, %eax popq %rbp ret |
看,多了一行,mfence。
怎么回事呢?這是gcc之前的一個BUG:http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 。 2008年被發現,然后修復的。其實它之所以是一個BUG,關鍵在於gcc的開發者很迷惑,mfence在x86 CPU上到底有沒有用?有嘛用?
那么mfence到底能不能提供我們想要的結果呢?
Intel的手冊中有這么一段我一直沒太讀懂:
“Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the MFENCE instruction; data can be brought into the caches speculatively just before, during, or after the execution of an MFENCE instruction. Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the MFENCE instruction; data can be brought into the caches speculatively just before, during, or after the execution of an MFENCE instruction.”
但是在關於Memory Ordering的章節又說:
“Reads cannot pass earlier MFENCE instructions
Writes cannot pass earlier MFENCE instructions.
MFENCE instructions cannot pass earlier reads or writes”
綜合起來的意思是,如果代碼是這樣,
READ A
MFENCE
READ B
那么可能會成為這樣
READ A
Speculative READ B
MFENCE
但是不會成為這樣
Speculative READ B
READ A
MFENCE
也就是說,Speculative READ可以穿越mfence,但是無法走的更遠了,不能再穿越前面的讀寫的操作。所以無論如何,A、B的讀取順序還是被嚴格保持的。不知道我的理解對不對。
但是這些只是針對單CPU而言。多CPU(包括超線程和多核)的時候,如果還是用Locked instructions,那么沒問題。否則手冊里的規則沒有特別提到mfence,而是說了這么一句,”Memory ordering obeys causality (memory ordering respects transitive visibility).” Causality 也是一種relaxed Model,我不是很能理解。只看到一句通俗點的解釋,” If i see it and tell you about it , then you will see it too.”
這個改動對JAVA社區的影響巨大。JVM通過插入mfence指令,使得volatile變量的寫入操作,能得到sequential consistency。於是Java社區現在就只有兩條路,要么修改語言規范,要么把mfence換成代價更高的xchg指令。David Dice(transactional memory的TL/TL2算法的發明人)的這篇日志http://blogs.oracle.com/dave/entry/java_memory_model_concerns_on 那寫的是相當的悲苦啊。
如果實在想在代碼中使用membar,那么最好是認准某個編譯器和某個平台。否則,非常推薦intel tbb。tbb:: atomic_fence()。
嗯,最后說下C/C++中的volatile。如果非要去咬文嚼字的看標准,那么這個關鍵字對於多線程同步簡直一點幫助都沒有。那么實際呢?我們發現很多人在C/C++中用這個關鍵字,而且,他們不是在寫驅動,就是普通的用戶態的代碼。他們真的都錯了嗎?
看看這段神奇的代碼:
http://software.intel.com/en-us/articles/single-producer-single-consumer-queue/
這段代碼中,所有的membar都可以去掉,無非是告訴編譯器,別亂搞。volatile關鍵字,在具體某個編譯器中的含義,要遠比ISO標准中說的要多。
這是我今天用tbb寫的一個Singleton:
class StatusCodeManager{
public: static StatusCodeManager& instance(){ if( !value ) { StatusCodeManager* tmp = new StatusCodeManager(); if( value.compare_and_swap(tmp,NULL)!=NULL ) // Another thread installed the value, so throw away mine. delete tmp; } return *value; } protected: StatusCodeManager(); private: static tbb::atomic<StatusCodeManager*> value; }; |
雖然無鎖化了,但是並不保證構造函數只執行一次。所以我一直在愁,value->init();插在哪里好呢?
(p.s. 今天找paper的時候看到一篇讓我滿眼冒金星的神文:《Mathematizing C++ Concurrency》 http://www.cl.cam.ac.uk/~pes20/cpp/popl085ap-sewell.pdf 第一作者是劍橋的某在讀博士。)