关于x86屏障指令以及 lfence + sfence != mfence 屏障

本文转载自查看原文 2021-10-10 17:18 1573 Java 并发

The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way of ensuring load and store
memory ordering between routines that produce weakly-ordered results and routines that consume that data. The
functions of these instructions are as follows:
• SFENCE — Serializes all store (write) operations that occurred prior to the SFENCE instruction in the program
instruction stream, but does not affect load operations.
• LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE instruction in the program
instruction stream, but does not affect store operations.2
• MFENCE — Serializes all store and load operations that occurred prior to the MFENCE instruction in the
program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient method of controlling memory
ordering than the CPUID instruction.
https://www.intel.co.kr/content/www/kr/ko/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html
========================== 1 ===================

通常来说本来一条指令的就可以变成两条，就可能分别的各自乱序执行。用sfence+lfence组合仅可以解决重排问题，但不能解决全局可见性问题，简单理解不如视为sfence和lfence本身也能乱序重拍。

The real concern is StoreLoad reordering between a store and a load, not between a store and barriers, so you should look at a case with a store, then a barrier, then a load.

mov  [var1], eax
sfence
lfence
mov   eax, [var2]
can become globally visible (i.e. commit to L1d cache) in this order:

lfence
mov   eax, [var2]     ; load stays after LFENCE

mov  [var1], eax      ; store becomes globally visible before SFENCE
sfence                ; can reorder with LFENCE

https://zhuanlan.zhihu.com/p/43526907

============================================== 2 ===========================
The main reason why SFENCE + LFENCE is not equal to MFENCE is because SFENCE + LFENCE doesn't block StoreLoad reordering, so it's not sufficient for sequential consistency. Only mfence (or a locked operation, or a real serializing instruction like cpuid) will do that. See Jeff Preshing's Memory Reordering Caught in the Act for a case where only a full barrier is sufficient.

SFENCE only orders stores against other stores, i.e. prevents NT stores from committing from the store buffer ahead of SFENCE itself. It does not necessarily force the store buffer to be drained before it retires from the ROB, so putting LFENCE after it doesn't add up to MFENCE.

意思是说，sfence可能只是将所有操作加入 storebuufer就返回了，并没有限制invalidate queue一定是空。考虑的是sfence的其中一种实现，每次执行不清空storebuufer而是所有操作追加方式，来保证store的顺序。

MFENCE does have to prevent NT stores from reordering with other stores, so it has to include whatever SFENCE does, as well as draining the store buffer. And also reordering of weakly-ordered SSE4.1 NT loads from WC memory, which is harder because the normal rules that get load ordering for free no longer apply to those. Guaranteeing this is why a Skylake microcode update strengthened (and slowed) MFENCE to also drain the ROB like LFENCE. It might still be possible for MFENCE to be lighter weight than that with HW support for optionally enforcing ordering of NT loads in the pipeline.

https://stackoverflow.com/questions/27627969/why-is-or-isnt-sfence-lfence-equivalent-to-mfence

======================= 3 ========================

https://zhuanlan.zhihu.com/p/66085562

======================= 4 ========================
SFENCE + LFENCE不会阻止StoreLoad重新排序，因此不足以实现顺序一致性.只有mfence(或lock ed操作或真正的序列化指令，如cpuid)将执行此操作.请参阅Jeff Preshing的《法案》中涉及的内存重新排序只有一个完整的障碍就足够了.
https://www.it1352.com/1949292.html

======================= 5 ========================

// src/backend/utils/rac/lock_free_queu.array_spsc_queue.c
define mb() asm volatile("mfence":::"memory")
define rmb() asm volatile("lfence":::"memory")
define wmb() asm volatile("sfence" ::: "memory")
https://blog.csdn.net/liuhhaiffeng/article/details/106493224

关于 MESI 和内存屏障的来历:https://zhuanlan.zhihu.com/p/125549632

Linux操作系统将写屏障指令封装成了smp_wmb()函数，cpu执行smp_mb()的思路是，会先把当前store buffer中的数据刷到cache之后，再执行屏障后的“写入操作”，该思路有两种实现方式: 一是简单地刷store buffer，但如果此时远程cache line没有返回，则需要等待，二是将当前store buffer中的条目打标，然后将屏障后的“写入操作”也写到store buffer中，cpu继续干其他的事，当被打标的条目全部刷到cache line，之后再刷后面的条目（具体实现非本文目标），以第二种实现逻辑为例，我们看看以下代码执行过程：

======================= 6 ========================

在 x86 处理器（Linux）中，哪些汇编指令有内存屏障的语义：

   1）对 I/O 端口进行操作的所有指令。

   2）有 Lock 前缀的所有指令。

   3）写控制寄存器、系统寄存器或者调试寄存器的所有指令（例如：cli 和 sti，用于修改 eflags 寄存器的 IF 标志的状态）。

    4）在 Pentium 4 微处理器中引入的汇编语言指令 lfence、sfence 和 mfence，他们分别有效地实现读内存屏障、写内存屏障和读写内粗屏障。

    5）少数专门的汇编语言指令，种植中断处理程序或者异常处理程序的 iret 指令就是其中的一个。

https://xie.infoq.cn/article/680fd531df57856ddcb532914

======================= 7 ========================

https://www.felixcloutier.com/x86/lfence

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 内存屏障与JVM指令内存屏障指令指令重排序和内存屏障 x86汇编指令详解 X86汇编指令大全 x86 体系指令 x86汇编转移跳转指令 x86指令分类详解（总结篇） X86架构解析及指令模拟流程 x86汇编指令（push，pop，call，ret）