轉自:http://tinylab.org/arm-wfe/
微信公眾號 | 知識星球 | |
![]() |
關注 @泰曉科技 與數千位一線 Linux 工程師做朋友,您准備好了嗎? |
![]() |
周一到周五,天天有新文。 | 日更實戰經驗與技巧! |
1 背景簡介
大家好,我叫張昺華,中間那個字和“餅”字一個讀音,本人非常熱衷技術,是個技術狂熱者。
今天我想分享一個跟多核鎖原理相關的東西,由於我搞 arm 居多,所以目前只研究了 arm 架構下的 WFE 指令,分享出來,如果有表述不精准或者錯誤的地方還請大家指出,非常感謝。研究這個原因也是只是想搞清楚所以然和來龍去脈,以后寫代碼可以更游刃有余。
2 我與 WFE 的初次見面
偶然看 spin_lock
的 arm 架構下的 smp 源碼的時候,發現了 wfe()
這個接口:
-
static inline void arch_spin_lock(arch_spinlock_t *lock) { unsigned long tmp; u32 newval; arch_spinlock_t lockval; prefetchw(&lock->slock); __asm__ __volatile__( "1: ldrex %0, [%3]\n" " add %1, %0, %4\n" " strex %2, %1, [%3]\n" " teq %2, #0\n" " bne 1b" : "=&r" (lockval), "=&r" (newval), "=&r" (tmp) : "r" (&lock->slock), "I" (1 << TICKET_SHIFT) : "cc"); while (lockval.tickets.next != lockval.tickets.owner) { wfe(); lockval.tickets.owner = READ_ONCE(lock->tickets.owner); } smp_mb(); }
我印象之前的 kernel 是沒有這個 wfe 這個函數的,當 cpu0 獲取到鎖后,如果 cpu1 再想獲取鎖,此時會被 lock 住,然后進入死等的狀態,那么 wfe 這個指令的作用是會讓 cpu 進入 low power standby,這樣可以降低功耗,本來發生競態時其他的 cpu 都要等待這個鎖釋放才能運行,有了這個指令,相當於是“因禍得福”了,還可以降低功耗,當然這是有條件的,后面追溯並研究了一下 wfe 這個指令的作用。
3 spinlock 與 WFE、SEV、WFI
首先 spin_lock
函數,搞內核的大家都知道,那么我把 linux-stable 的代碼黏貼出來如下:
-
static __always_inline void spin_lock(spinlock_t *lock) { raw_spin_lock(&lock->rlock); } #define raw_spin_lock(lock) _raw_spin_lock(lock) #ifndef CONFIG_INLINE_SPIN_LOCK void __lockfunc _raw_spin_lock(raw_spinlock_t *lock) { __raw_spin_lock(lock); } EXPORT_SYMBOL(_raw_spin_lock); #endif static inline void __raw_spin_lock(raw_spinlock_t *lock) { preempt_disable(); spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); } /* * We are now relying on the NMI watchdog to detect lockup instead of doing * the detection here with an unfair lock which can cause problem of its own. */ void do_raw_spin_lock(raw_spinlock_t *lock) { debug_spin_lock_before(lock); arch_spin_lock(&lock->raw_lock); mmiowb_spin_lock(); debug_spin_lock_after(lock); } /* * ARMv6 ticket-based spin-locking. * * A memory barrier is required after we get a lock, and before we * release it, because V6 CPUs are assumed to have weakly ordered * memory. */ static inline void arch_spin_lock(arch_spinlock_t *lock) { unsigned long tmp; u32 newval; arch_spinlock_t lockval; prefetchw(&lock->slock); __asm__ __volatile__( "1: ldrex %0, [%3]\n" " add %1, %0, %4\n" " strex %2, %1, [%3]\n" " teq %2, #0\n" " bne 1b" : "=&r" (lockval), "=&r" (newval), "=&r" (tmp) : "r" (&lock->slock), "I" (1 << TICKET_SHIFT) : "cc"); while (lockval.tickets.next != lockval.tickets.owner) { wfe(); lockval.tickets.owner = READ_ONCE(lock->tickets.owner); } smp_mb(); }
對於 arm32:
-
#if __LINUX_ARM_ARCH__ >= 7 || \ (__LINUX_ARM_ARCH__ == 6 && defined(CONFIG_CPU_32v6K)) #define sev() __asm__ __volatile__ ("sev" : : : "memory") #define wfe() __asm__ __volatile__ ("wfe" : : : "memory") #define wfi() __asm__ __volatile__ ("wfi" : : : "memory") #else #define wfe() do { } while (0) #endif
對於 arm64:
-
#define sev() asm volatile("sev" : : : "memory") #define wfe() asm volatile("wfe" : : : "memory") #define wfi() asm volatile("wfi" : : : "memory") static inline void arch_spin_unlock(arch_spinlock_t *lock) { smp_mb(); lock->tickets.owner++; dsb_sev(); } #define SEV __ALT_SMP_ASM(WASM(sev), WASM(nop)) static inline void dsb_sev(void) { dsb(ishst); __asm__(SEV); } #ifdef CONFIG_SMP #define __ALT_SMP_ASM(smp, up) \ "9998: " smp "\n" \ " .pushsection \".alt.smp.init\", \"a\"\n" \ " .long 9998b\n" \ " " up "\n" \ " .popsection\n" #else #define __ALT_SMP_ASM(smp, up) up #endif
以上我們可以看出,在 lock 的時候使用 WFE,在 unlock 的時候使用 SEV,這個必須要成對使用,原因我下面會說。
對於內核版本 Linux 3.0.56:
-
static inline void arch_spin_lock(arch_spinlock_t *lock) { unsigned long tmp; __asm__ __volatile__( "1: ldrex %0, [%1]\n" " teq %0, #0\n" WFE("ne") " strexeq %0, %2, [%1]\n" " teqeq %0, #0\n" " bne 1b" : "=&r" (tmp) : "r" (&lock->lock), "r" (1) : "cc"); smp_mb(); }
對於 Linux 2.6.18:
-
define _raw_spin_lock(lock) __raw_spin_lock(&(lock)->raw_lock) static inline void __raw_spin_lock(raw_spinlock_t *lock) { unsigned long tmp; __asm__ __volatile__( "1: ldrex %0, [%1]\n" " teq %0, #0\n" " strexeq %0, %2, [%1]\n" " teqeq %0, #0\n" " bne 1b" : "=&r" (tmp) : "r" (&lock->lock), "r" (1) : "cc"); smp_mb(); }
以上大家可以看出,最早期的 kernel 版本是沒有 wfe 這條指令的,后面的版本才有。
4 WFE、SEV 與 WFI 的作用與工作原理
那這條指令的作用是什么呢?我們可以上 arm 官網去查看這條指令的描述:ARM Software development tools
- SEV
SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented, WFE must also be implemented.
SEV 指令可以產生事件信號,發送給全部的 cpu,讓他們喚醒。如果 SEV 實現了,那么 WFE 也必須被實現。這里的事件信號其實會表現為 Event register,這是個一 bit 的 register,如果有事件,那么此 bit 為真。
- WFE
If the Event Register is not set, WFE suspends execution until one of the following events occurs:
• an IRQ interrupt, unless masked by the CPSR I-bit • an FIQ interrupt, unless masked by the CPSR F-bit • an Imprecise Data abort, unless masked by the CPSR A-bit • a Debug Entry request, if Debug is enabled • an Event signaled by another processor using the SEV instruction.
If the Event Register is set, WFE clears it and returns immediately. If WFE is implemented, SEV must also be implemented.
對於 WFE,如果 Event Register 沒有設置,WFE 會讓 cpu 進入 low-power state,直到下面列舉的五個 events 產生,比如說中斷等等都會喚醒當前因為 WFE 而 suspend 的cpu。
如果 Event Register 被設置了,那么 WFE 會直接返回,不讓 cpu 進入low-power state,目的是因為既然有事件產生了,說明當前 cpu 需要干活,不能 suspend,所以才這樣設計。
這里我很好奇 Event Register 到底是怎么理解的。因此需要閱讀手冊《ARM Architecture Reference Manual.pdf》,下面我會做說明。
- WFI
WFI suspends execution until one of the following events occurs:
• an IRQ interrupt, regardless of the CPSR I-bit • an FIQ interrupt, regardless of the CPSR F-bit • an Imprecise Data abort, unless masked by the CPSR A-bit • a Debug Entry request, regardless of whether Debug is enabled.
而對於 WFI 這種,不會判斷 Event register,暴力的直接讓 cpu 進入 low-power state,直到有上述四個 events 產生才會喚醒 cpu。
注意,這里 WFE 比 WFI 多了一個喚醒特性:
an Event signaled by another processor using the SEV instruction.
也就是說 SEV 是不會喚醒 WFI 指令休眠的 cpu 的。這點需要特別注意。
接下來我談下這個 Event Register 是怎么回事了。看這個文檔《ARM Architecture Reference Manual.pdf》
- The Event Register
The Event Register is a single bit register for each processor. When set, an event register indicates that an event has occurred, since the register was last cleared, that might require some action by the processor. Therefore, the processor must not suspend operation on issuing a WFE instruction.
The reset value of the Event Register is UNKNOWN.
The Event Register is set by:
• an SEV instruction • an event sent by some IMPLEMENTATION DEFINED mechanism • a debug event that causes entry into Debug state • an exception return.
As shown in this list, the Event Register might be set by IMPLEMENTATION DEFINED mechanisms. The Event Register is cleared only by a Wait For Event instruction. Software cannot read or write the value of the Event Register directly.
以上就是 Event Register 的表述,上述已經說的很明白了,Event Register 只有一個 bit,可以被 set 的情況總有四種大類型。當任意一個條件滿足的時候,Event Register 都可以被 set,那么當 WFE 進入的時候會進行 Event Register 的判斷,如果為真,就直接返回。
再來看看 WFE 的介紹:
Wait For Event is a hint instruction that permits the processor to enter a low-power state until one of a number of events occurs, including events signaled by executing the SEV instruction on any processor in the multiprocessor system. For more information, see Wait For Event and Send Event on page B1-1197.
In an implementation that includes the Virtualization Extensions, if HCR.TWE is set to 1, execution of a WFE instruction in a Non-secure mode other than Hyp mode generates a Hyp Trap exception if, ignoring the value of the HCR.TWE bit, conditions permit the processor to suspend execution. For more information see Trapping use of the WFI and WFE instructions on page B1-1249.
接下來上 WFE 這條指令的偽代碼流程:
-
Assembler syntax WFE{<c>}{<q>} where: <c>, <q> See Standard assembler syntax fields on page A8-285. Operation if ConditionPassed() then EncodingSpecificOperations(); if EventRegistered() then ClearEventRegister(); else if HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() && HCR.TWE == '1' then HSRString = Zeros(25); HSRString<0> = '1'; WriteHSR('000001', HSRString); TakeHypTrapException(); else WaitForEvent(); Exceptions Hyp Trap.
看了上面這段 WFE 的偽代碼,一目了然,首先判斷 ConditionPassed()
,這些函數大家可以在 arm 手冊中查看其詳細含義,如果 EventResigerted()
函數為真,也就是這個 1 bit 的寄存器為真,那么就清除此 bit,然后退出返回,不會讓 cpu 進入 low power state;
如果不是異常處理,TakeHypTrapException()
,那么就 WaitForEvent()
,等待喚醒事件到來,到來了,就喚醒當前 cpu。
為什么有事件來了就直接返回呢,因為 WFE 的設計認為,如果此時有 event 事件,那么說明當前 cpu 要干活,那就沒必要進入 low power state 模式。
如果沒有事件產生,那么就可以進入 low power state 模式,因為 cpu 反正也是在等待鎖,此時也干不了別的事情,還不如休眠還可以降低功耗。
當然,irq,fiq 等很多中斷都可以讓 WFE 休眠的 cpu 喚醒,那么這樣做還有什么意義呢?比如說時鍾中斷是一直產生的,那么 cpu 很快就醒了啊,都不用等到發 SEV,那么既然是用 spin_lock
,也可以在中斷上半部使用,也可以在進程上下文,既然是自旋鎖,就意味着保護的這段代碼是要足夠精簡,不希望被其他東西打斷,那么如果你保護的這部分代碼非常長,這時候整個系統響應很可能會變慢,因為如果這時候有人也要使用這個鎖的話,那么是否保護的這段代碼設計上是有問題的。因此用 spin_lock
保護的函數盡可能要短,如果長的話可能需要換其他鎖,或者考慮下是否真的要這么長的保護措施。
TakeHypTrapException()
,是進入異常處理
-
Pseudocode description of taking the Hyp Trap exception The TakeHypTrapException() pseudocode procedure describes how the processor takes the exception: // TakeHypTrapException() // ====================== TakeHypTrapException() // HypTrapException is caused by executing an instruction that is trapped to Hyp mode as a // result of a trap set by a bit in the HCR, HCPTR, HSTR or HDCR. By definition, it can only // be generated in a Non-secure mode other than Hyp mode. // Note that, when a Supervisor Call exception is taken to Hyp mode because HCR.TGE==1, this // is not a trap of the SVC instruction. See the TakeSVCException() pseudocode for this case. preferred_exceptn_return = if CPSR.T == '1' then PC-4 else PC-8; new_spsr_value = CPSR; EnterHypMode(new_spsr_value, preferred_exceptn_return, 20); Additional pseudocode functions for exception handling on page B1-1221 defines the EnterHypMode() pseudocode procedure.
ClearEventRegister()
Clear the Event Register of the current processor 清除Event Register的bit
EventRegistered()
Determine whether the Event Register of the current processor is set Event Register bit為真,即被設置過
WaitForEvent()
Wait until WFE instruction completes
等待 Events 事件,有任何一個 Event 事件來臨,都會喚醒當前被 WFE suspend 下去的 cpu, 如果是 SEV,會喚醒全部被 WFE suspend 下去的cpu。
如下是關於 WFE 的 wake up events 事件描述和列舉:
WFE wake-up events
The following events are WFE wake-up events:
• the execution of an SEV instruction on any processor in the multiprocessor system • a physical IRQ interrupt, unless masked by the CPSR.I bit • a physical FIQ interrupt, unless masked by the CPSR.F bit • a physical asynchronous abort, unless masked by the CPSR.A bit • in Non-secure state in any mode other than Hyp mode: — when HCR.IMO is set to 1, a virtual IRQ interrupt, unless masked by the CPSR.I bit — when HCR.FMO is set to 1, a virtual FIQ interrupt, unless masked by the CPSR.F bit — when HCR.AMO is set to 1, a virtual asynchronous abort, unless masked by the CPSR.A bit • an asynchronous debug event, if invasive debug is enabled and the debug event is permitted • an event sent by the timer event stream, see Event streams on page B8-1934 • an event sent by some IMPLEMENTATION DEFINED mechanism.
In addition to the possible masking of WFE wake-up events shown in this list, when invasive debug is enabled and DBGDSCR[15:14] is not set to 0b00, DBGDSCR.INTdis can mask interrupts, including masking them acting as WFE wake-up events. For more information, see DBGDSCR, Debug Status and Control Register on page C11-2206. As shown in the list of wake-up events, an implementation can include IMPLEMENTATION DEFINED hardware mechanisms to generate wake-up events. NoteFor more information about CPSR masking see Asynchronous exception masking on page B1-1181. If the configuration of the masking controls provided by the Security Extensions, or Virtualization Extensions, mean that a CPSR mask bit cannot mask the corresponding exception, then the physical exception is a WFE wake-up event, regardless of the value of the CPSR mask bit.
接下來我們看下 WFI 的偽代碼:
-
Assembler syntax WFI{<c>}{<q>} where: <c>, <q> See Standard assembler syntax fields on page A8-285. Operation if ConditionPassed() then EncodingSpecificOperations(); if HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() && HCR.TWI == '1' then HSRString = Zeros(25); HSRString<0> = '1'; WriteHSR('000001', HSRString); TakeHypTrapException(); else WaitForInterrupt(); Exceptions Hyp Trap.
相關解釋:
WFI
Wait For Interrupt is a hint instruction that permits the processor to enter a low-power state until one of a number of asynchronous events occurs. For more information, see Wait For Interrupt on page B1-1200. In an implementation that includes the Virtualization Extensions, if HCR.TWI is set to 1, execution of a WFE instruction in a Non-secure mode other than Hyp mode generates a Hyp Trap exception if, ignoring the value of the HCR.TWI bit, conditions permit the processor to suspend execution. For more information see Trapping use of the WFI and WFE instructions on page B1-1249.
以上我們可以看出,WFI 並沒有去判斷 event register,因此看偽代碼可以很直觀的看出 WFI 與 WFE 的區別。
以上是我個人的理解,如果有表述不精准或者不准確的地方還請大家指出,歡迎交流。
5 參考資料
猜你喜歡:
- 知識星球:獨家 Linux 實戰經驗與技巧,訂閱「Linux知識星球」
- 儒碼科技:Linux 技術咨詢、培訓與服務,聯系「儒碼科技」
- 技術交流:Linux 用戶技術交流微信群,聯系微信號:tinylab
支付寶打賞 | 微信打賞 | |
![]() |
![]() 請作者喝杯咖啡吧 |
![]() |
Read Related:
Read Latest:
- 泰曉資訊·5月 / 第三期 / 2020
- Linux Lab 發布 v0.4 rc3,新增 ARM64 Uboot 支持
- Linux Lab:難以抗拒的十大理由 V2.0
- Linux 下的 Sync 卡死問題分析報告
- 泰曉資訊·5月 / 第二期 / 2020
本作品由 Zhang Binghua 創作,采用 CC BY-NC-ND 4.0 協議 進行許可。未經授權,謝絕商業使用!