關於x86屏障指令以及 lfence + sfence != mfence 屏障

本文轉載自查看原文 2021-10-10 17:18 1573 Java 並發

The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way of ensuring load and store
memory ordering between routines that produce weakly-ordered results and routines that consume that data. The
functions of these instructions are as follows:
• SFENCE — Serializes all store (write) operations that occurred prior to the SFENCE instruction in the program
instruction stream, but does not affect load operations.
• LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE instruction in the program
instruction stream, but does not affect store operations.2
• MFENCE — Serializes all store and load operations that occurred prior to the MFENCE instruction in the
program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient method of controlling memory
ordering than the CPUID instruction.
https://www.intel.co.kr/content/www/kr/ko/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html
========================== 1 ===================

通常來說本來一條指令的就可以變成兩條，就可能分別的各自亂序執行。用sfence+lfence組合僅可以解決重排問題，但不能解決全局可見性問題，簡單理解不如視為sfence和lfence本身也能亂序重拍。

The real concern is StoreLoad reordering between a store and a load, not between a store and barriers, so you should look at a case with a store, then a barrier, then a load.

mov  [var1], eax
sfence
lfence
mov   eax, [var2]
can become globally visible (i.e. commit to L1d cache) in this order:

lfence
mov   eax, [var2]     ; load stays after LFENCE

mov  [var1], eax      ; store becomes globally visible before SFENCE
sfence                ; can reorder with LFENCE

https://zhuanlan.zhihu.com/p/43526907

============================================== 2 ===========================
The main reason why SFENCE + LFENCE is not equal to MFENCE is because SFENCE + LFENCE doesn't block StoreLoad reordering, so it's not sufficient for sequential consistency. Only mfence (or a locked operation, or a real serializing instruction like cpuid) will do that. See Jeff Preshing's Memory Reordering Caught in the Act for a case where only a full barrier is sufficient.

SFENCE only orders stores against other stores, i.e. prevents NT stores from committing from the store buffer ahead of SFENCE itself. It does not necessarily force the store buffer to be drained before it retires from the ROB, so putting LFENCE after it doesn't add up to MFENCE.

意思是說，sfence可能只是將所有操作加入 storebuufer就返回了，並沒有限制invalidate queue一定是空。考慮的是sfence的其中一種實現，每次執行不清空storebuufer而是所有操作追加方式，來保證store的順序。

MFENCE does have to prevent NT stores from reordering with other stores, so it has to include whatever SFENCE does, as well as draining the store buffer. And also reordering of weakly-ordered SSE4.1 NT loads from WC memory, which is harder because the normal rules that get load ordering for free no longer apply to those. Guaranteeing this is why a Skylake microcode update strengthened (and slowed) MFENCE to also drain the ROB like LFENCE. It might still be possible for MFENCE to be lighter weight than that with HW support for optionally enforcing ordering of NT loads in the pipeline.

https://stackoverflow.com/questions/27627969/why-is-or-isnt-sfence-lfence-equivalent-to-mfence

======================= 3 ========================

https://zhuanlan.zhihu.com/p/66085562

======================= 4 ========================
SFENCE + LFENCE不會阻止StoreLoad重新排序，因此不足以實現順序一致性.只有mfence(或lock ed操作或真正的序列化指令，如cpuid)將執行此操作.請參閱Jeff Preshing的《法案》中涉及的內存重新排序只有一個完整的障礙就足夠了.
https://www.it1352.com/1949292.html

======================= 5 ========================

// src/backend/utils/rac/lock_free_queu.array_spsc_queue.c
define mb() asm volatile("mfence":::"memory")
define rmb() asm volatile("lfence":::"memory")
define wmb() asm volatile("sfence" ::: "memory")
https://blog.csdn.net/liuhhaiffeng/article/details/106493224

關於 MESI 和內存屏障的來歷:https://zhuanlan.zhihu.com/p/125549632

Linux操作系統將寫屏障指令封裝成了smp_wmb()函數，cpu執行smp_mb()的思路是，會先把當前store buffer中的數據刷到cache之后，再執行屏障后的“寫入操作”，該思路有兩種實現方式: 一是簡單地刷store buffer，但如果此時遠程cache line沒有返回，則需要等待，二是將當前store buffer中的條目打標，然后將屏障后的“寫入操作”也寫到store buffer中，cpu繼續干其他的事，當被打標的條目全部刷到cache line，之后再刷后面的條目（具體實現非本文目標），以第二種實現邏輯為例，我們看看以下代碼執行過程：

======================= 6 ========================

在 x86 處理器（Linux）中，哪些匯編指令有內存屏障的語義：

   1）對 I/O 端口進行操作的所有指令。

   2）有 Lock 前綴的所有指令。

   3）寫控制寄存器、系統寄存器或者調試寄存器的所有指令（例如：cli 和 sti，用於修改 eflags 寄存器的 IF 標志的狀態）。

    4）在 Pentium 4 微處理器中引入的匯編語言指令 lfence、sfence 和 mfence，他們分別有效地實現讀內存屏障、寫內存屏障和讀寫內粗屏障。

    5）少數專門的匯編語言指令，種植中斷處理程序或者異常處理程序的 iret 指令就是其中的一個。

https://xie.infoq.cn/article/680fd531df57856ddcb532914

======================= 7 ========================

https://www.felixcloutier.com/x86/lfence

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 x86指令分類詳解（總結篇） X86架構解析及指令模擬流程 ARM, X86和MIPS CISC（復雜指令集）和RISC（簡單指令集），x86、ARM、MIPS架構用Qemu搭建x86學習環境 X86匯編快速入門 X86匯編快速入門 opencv在arm和x86在移植 UOS 如何安裝xdroid【x86】 android X86 4.3編譯