The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way of ensuring load and store
memory ordering between routines that produce weakly-ordered results and routines that consume that data. The
functions of these instructions are as follows:
• SFENCE — Serializes all store (write) operations that occurred prior to the SFENCE instruction in the program
instruction stream, but does not affect load operations.
• LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE instruction in the program
instruction stream, but does not affect store operations.2
• MFENCE — Serializes all store and load operations that occurred prior to the MFENCE instruction in the
program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient method of controlling memory
ordering than the CPUID instruction.
https://www.intel.co.kr/content/www/kr/ko/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html
========================== 1 ===================
通常來說本來一條指令的就可以變成兩條,就可能分別的各自亂序執行。用sfence+lfence組合僅可以解決重排問題,但不能解決全局可見性問題,簡單理解不如視為sfence和lfence本身也能亂序重拍。
The real concern is StoreLoad reordering between a store and a load, not between a store and barriers, so you should look at a case with a store, then a barrier, then a load.
mov [var1], eax
sfence
lfence
mov eax, [var2]
can become globally visible (i.e. commit to L1d cache) in this order:
lfence
mov eax, [var2] ; load stays after LFENCE
mov [var1], eax ; store becomes globally visible before SFENCE
sfence ; can reorder with LFENCE
https://zhuanlan.zhihu.com/p/43526907
============================================== 2 ===========================
The main reason why SFENCE + LFENCE is not equal to MFENCE is because SFENCE + LFENCE doesn't block StoreLoad reordering, so it's not sufficient for sequential consistency. Only mfence (or a locked operation, or a real serializing instruction like cpuid) will do that. See Jeff Preshing's Memory Reordering Caught in the Act for a case where only a full barrier is sufficient.
SFENCE only orders stores against other stores, i.e. prevents NT stores from committing from the store buffer ahead of SFENCE itself. It does not necessarily force the store buffer to be drained before it retires from the ROB, so putting LFENCE after it doesn't add up to MFENCE.
意思是說,sfence可能 只是將所有操作加入 storebuufer就返回了,並沒有限制invalidate queue一定是空。考慮的是sfence的其中一種實現,每次執行不清空storebuufer而是所有操作追加方式,來保證store的順序。
MFENCE does have to prevent NT stores from reordering with other stores, so it has to include whatever SFENCE does, as well as draining the store buffer. And also reordering of weakly-ordered SSE4.1 NT loads from WC memory, which is harder because the normal rules that get load ordering for free no longer apply to those. Guaranteeing this is why a Skylake microcode update strengthened (and slowed) MFENCE to also drain the ROB like LFENCE. It might still be possible for MFENCE to be lighter weight than that with HW support for optionally enforcing ordering of NT loads in the pipeline.
https://stackoverflow.com/questions/27627969/why-is-or-isnt-sfence-lfence-equivalent-to-mfence
======================= 3 ========================
https://zhuanlan.zhihu.com/p/66085562
======================= 4 ========================
SFENCE + LFENCE不會阻止StoreLoad重新排序,因此不足以實現順序一致性.只有mfence(或lock ed操作或真正的序列化指令,如cpuid)將執行此操作.請參閱Jeff Preshing的《法案》中涉及的內存重新排序只有一個完整的障礙就足夠了.
https://www.it1352.com/1949292.html
======================= 5 ========================
// src/backend/utils/rac/lock_free_queu.array_spsc_queue.c
define mb() asm volatile("mfence":::"memory")
define rmb() asm volatile("lfence":::"memory")
define wmb() asm volatile("sfence" ::: "memory")
https://blog.csdn.net/liuhhaiffeng/article/details/106493224
關於 MESI 和 內存屏障的來歷:https://zhuanlan.zhihu.com/p/125549632
Linux操作系統將寫屏障指令封裝成了smp_wmb()函數,cpu執行smp_mb()的思路是,會先把當前store buffer中的數據刷到cache之后,再執行屏障后的“寫入操作”,該思路有兩種實現方式: 一是簡單地刷store buffer,但如果此時遠程cache line沒有返回,則需要等待,二是將當前store buffer中的條目打標,然后將屏障后的“寫入操作”也寫到store buffer中,cpu繼續干其他的事,當被打標的條目全部刷到cache line,之后再刷后面的條目(具體實現非本文目標),以第二種實現邏輯為例,我們看看以下代碼執行過程:
======================= 6 ========================
在 x86 處理器(Linux)中,哪些匯編指令有內存屏障的語義:
1)對 I/O 端口進行操作的所有指令。
2)有 Lock 前綴的所有指令。
3)寫控制寄存器、系統寄存器或者調試寄存器的所有指令(例如:cli 和 sti,用於修改 eflags 寄存器的 IF 標志的狀態)。
4)在 Pentium 4 微處理器中引入的匯編語言指令 lfence、sfence 和 mfence,他們分別有效地實現讀內存屏障、寫內存屏障和讀寫內粗屏障。
5)少數專門的匯編語言指令,種植中斷處理程序或者異常處理程序的 iret 指令就是其中的一個。
https://xie.infoq.cn/article/680fd531df57856ddcb532914
======================= 7 ========================