Affinity broken due to vector space exhaustion 問題


dmesg 中異常打印:

 kernel: irq 632: Affinity broken due to vector space exhaustion.
 kernel: irq 633: Affinity broken due to vector space exhaustion.

這個打印並不是申請不到中斷號,而是已經申請到了中斷號,但是配置中斷路由的時候,
想要生效的中斷綁核與預期不一致,代碼為:

commit 743dac494d61d991967ebcfab92e4f80dc7583b3
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Thu Aug 22 10:34:21 2019 -0400

    x86/apic/vector: Warn when vector space exhaustion breaks affinity

    On x86, CPUs are limited in the number of interrupts they can have affined
    to them as they only **support 256 interrupt** vectors per CPU. 32 vectors are
    reserved for the CPU and the kernel reserves another 22 for internal
    purposes. That leaves 202 vectors for assignement to devices.

    When an interrupt is set up or the affinity is changed by the kernel or the
    administrator, the vector assignment code attempts to honor the requested
    affinity mask. If the vector space on the CPUs in that affinity mask is
    exhausted the code falls back to a wider set of CPUs and assigns a vector
    on a CPU outside of the requested affinity mask silently.

    While the effective affinity is reflected in the corresponding
    /proc/irq/$N/effective_affinity* files the silent breakage of the requested
    affinity can lead to unexpected behaviour for administrators.

    Add a pr_warn() when this happens so that adminstrators get at least
    informed about it in the syslog.

    [ tglx: Massaged changelog and made the pr_warn() more informative ]

    Reported-by: djuran@redhat.com
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: djuran@redhat.com
    Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index fdacb864c3dd..2c5676b0a6e7 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -398,6 +398,17 @@ static int activate_reserved(struct irq_data *irqd)
                if (!irqd_can_reserve(irqd))
                        apicd->can_reserve = false;
        }
+
+       /*
+        * Check to ensure that the effective affinity mask is a subset
+        * the user supplied affinity mask, and warn the user if it is not
+        */
+       if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
+                           irq_data_get_affinity_mask(irqd))) {
+               pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
+                       irqd->irq);
+       }
+
        return ret;
 }

原因作者也解釋得很清楚,就是x86的cpu,各個核能夠接收的中斷個數是有限制的,在centos7中,我們經常遇到配置中斷路由失敗的情況,沒有異常打印,
所以針對這個問題,目前在內核中增加了這個打印。然后centos 8.3也移植了這個打印。

遇到這個問題,由於我們0號核一般是重災區,所以要盡量將中斷不要路由到0號核。

那假設已經看到各個核其實vector容量還是ok的,也報這個錯,則需要關注,當前使用的內核版本,是否合入了:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/apic?id=190113b4c6531c8e09b31d5235f9b5175cbb0f72
這個補丁。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM