CPU知識學習匯總

本文轉載自查看原文 2020-05-16 10:10 737 Linux電源管理和功耗

一、相關名詞解釋

SMP：(Symmetric Multi-Processing)對稱多處理，一個chip上集成多個核心
SMT：(Simultaneous multithreading)同時多線程，一個核心上實現多個hardware context，以支持多線程。通過復制硬件寄存器狀態等手段，同時執行多個線程。

Node：某些Core之間，獨享總線和memory，稱作Node。Core只訪問Node內的memory，因此可以減輕對總線和memory的帶寬需求。但是有些場景下，Core會不可避免的訪問其它Node的memory，這會造成很大的訪問延遲。

NUMA: (Non-uniform Memory Access)不一致內存訪問，以內存訪問的不一致性為代價，減輕對總線和memory的帶寬需求。這種結構對進程調度算法的要求較高，盡量減少跨Node的內存訪問次數，以提升系統性能。

HMP：(Heterogeneous Multi-Processing)異構多處理，ARM的一種架構，在乎功耗的存在。HMP架構在一個chip中，封裝兩類ARM Core，一類為高性能Core（如Cortex-A15，也稱作big core），一類為低性能Core（如Cortex-A7，也稱作little core），因此HMP也稱作big·little架構。
還有big-middle-little架構。

二、CPU拓撲

1. CPU topology除了描述CPU的組成之外，其主要功能是向kernel調度器提供必要的信息，以便讓它合理地分配任務，最終達到性能和功耗之間的平衡。

CPU topology：Cluster-->Core-->Threads

2.CPU拓撲框架

-------------------------     ----------------------------  
|  CPU topology driver  |     |    Task Scheduler etc.  | 
-------------------------     ----------------------------
------------------------------------------------------- 
|      Kernel general CPU topology       | 
----------------------------------------------------------
---------------------------------------------------------- 
|      arch-dependent CPU topology       | 
----------------------------------------------------------

Kernel general CPU topology位於"include/linux/topology.h”中，定義了獲取系統CPU topology信息的標准接口。底層的arch-dependent CPU topology會根據平台的特性，實現kernel定義的那些接口。

CPU topology信息有兩個重要的使用場景：一是向用戶提供當前的CPU信息（eg：lscpu），這是由CPU topology driver實現的；二是向調度器提供CPU core的信息，以便合理的調度任務。

2.1 Kernel general CPU topology

Kernel general CPU topology位於 include/linux/topology.h 中，主要以“#ifndef ... #define”類型的宏定義的形式提供API，其目的是：底層的arch-dependent CPU topology可以重新定義這些宏，只要底層有定義，則優先使用底層的，否則就使用Kernel general CPU topology中的默認API，主要包括：

/* include/linux/topology.h */
#ifndef topology_physical_package_id
#define topology_physical_package_id(cpu)       ((void)(cpu), -1)
#endif
#ifndef topology_core_id
#define topology_core_id(cpu)                   ((void)(cpu), 0)
#endif
#ifndef topology_thread_cpumask
#define topology_thread_cpumask(cpu)            cpumask_of(cpu)
#endif
#ifndef topology_core_cpumask
#define topology_core_cpumask(cpu)              cpumask_of(cpu)
#endif

#ifdef CONFIG_SCHED_SMT
static inline const struct cpumask *cpu_smt_mask(int cpu)
{
    return topology_thread_cpumask(cpu);
}
#endif

static inline const struct cpumask *cpu_cpu_mask(int cpu)
{
    return cpumask_of_node(cpu_to_node(cpu));
}

topology_physical_package_id：用於獲取某個CPU的package ID，即socket(X86)或者cluster(ARM)，具體意義依賴於具體平台的實現；
topology_core_id：某個CPU的core ID。即第二章所描述的core，具體意義依賴於具體的平台實現；
topology_thread_cpumask：獲取和該CPU屬於同一個core的所有CPU，通俗的講，就是姐妹Thread；
topology_core_cpumask：獲取和該CPU屬於同一個cluster的所有CPU；
cpu_cpu_mask: 獲取該CPU屬於同一個Node的所有CPU；
cpu_smt_mask: 用於SMT調度（CONFIG_SCHED_SMT）的一個封裝，意義同topology_thread_cpumask。

2.2 arch-dependent CPU topology

位於“arch/arm64/include/asm/topology.h”和“arch/arm64/kernel/topology.c”中，主要負責ARM64平台相關的topology轉換，包括：

(1) 定義一個數據結構，以及基於該數據結構的變量，用於存儲系統的CPU topology

/* arch/arm64/include/asm/topology.h */
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
};
extern struct cpu_topology cpu_topology[NR_CPUS];

cluster_id、core_id、thead_id描述了拓撲結構的三個層次，thread_sibling和core_sibling，保存了和該CPU位於相同級別（同一個core和同一個cluster）的所有姐妹CPU。系統中每個CPU（個數由NR_CPUS指定，是從OS的角度看的）都有一個struct cpu_topology變量，用於描述該CPU在整個topology中的地位。以數組的形式維護。

(2)重定義CPU topology有關的宏定義

/* arch/arm64/include/asm/topology.h */
#define topology_physical_package_id(cpu)       (cpu_topology[cpu].cluster_id)
#define topology_core_id(cpu)           (cpu_topology[cpu].core_id)
#define topology_core_cpumask(cpu)      (&cpu_topology[cpu].core_sibling)
#define topology_thread_cpumask(cpu)    (&cpu_topology[cpu].thread_sibling)

實現比較簡單，從該CPU對應的struct cpu_topology變量中取出指定的字段即可。

(3)提供初始化並構建CPU topology的方法，以便在系統啟動時調用

/* arch/arm64/include/asm/topology.h */
void init_cpu_topology(void);
void store_cpu_topology(unsigned int cpuid);

init_cpu_topology的調用路徑是：kernel_init-->smp_prepare_cpus-->init_cpu_topology，主要完成如下任務：

store_cpu_topology的調用路徑是：kernel_init-->smp_prepare_cpus-->store_cpu_topology，在沒有從DTS中成功獲取CPU topology的情況下，從ARM64的MPIDR寄存器中讀取topology信息。

設備樹中的cpu-map和clusterX描述了CPU的拓撲結構，具體可參考“Documentation/devicetree/bindings/arm/topology.txt”中的描述。

2.3 CPU topology driver

CPU topology driver位於“drivers\base\topology.c”中，基於“include/linux/topology.h”所提供的API，以sysfs的形式，向用戶空間提供獲取CPU topology信息的接口，lscpu應用，就是基於該接口實現的。sysfs的格式可參考“Documentation/cputopology.txt”。

/sys/devices/system/cpu/cpuX/topology/下

physical_package_id: //就是此CPU位於的Cluster編號
core_id: //在一個Cluster內此CPU的編號
thread_siblings: //每個CPU核的位掩碼，CPU0 CPU1 CPU2分別為0x1 0x2 0x4
thread_siblings_list: //CPU是幾這個就是幾，CPU0就是0，CPU7就是7
core_siblings: //每個Cluster內的CPU組成的位掩碼，若四小核Cluster0就是0x0f，3中核就是0x70
core_siblings_list: //每個Cluster內的CPU組成的數字加中畫線顯示，若4小核，3中核，1大核，小核的是0-3，中核就是4-6，大核就是7

/sys/devices/system/cpu/下

kernel_max 一共有多少個核，7
kernel_max: 31
offline: 2,4-31,32-63
online: 0-1,3
possible: 0-31
present: 0-31

3. CPU一共有4種狀態需要表示：

cpu_possible_bits，系統中包含的所有的可能的CPU core，在系統初始化的時候就已經確定。對於ARM64來說，DTS中所有格式正確的CPU core，都屬於possible的core；

cpu_present_bits，系統中所有可用的CPU core（具備online的條件，具體由底層代碼決定），並不是所有possible的core都是present的。對於支持CPU hotplug的形態，present core可以動態改變；

cpu_online_bits，系統中所有運行狀態的CPU core（后面會詳細說明這個狀態的意義）；

cpu_active_bits，有active的進程正在運行的CPU core。

三、Linux cpu ops

1. 在SMP系統中，Linux kernel會在一個CPU（primary CPU）上完成啟動操作。primary CPU啟動完成后，再啟動其它的CPU（secondary CPUs），這稱作secondary CPU boot。一般是CPU0作為boot cpu。

2. CPU（或SOC）中會集成一個ROM，ROM上有CPU廠商在出廠時固化的代碼，這些代碼會進行一些必要的初始化后，將CPU跳轉到其它地址（例如0x20000000），這些地址一般是RAM或者NOR flash，用戶代
碼可以存放在這些位置。

3. 不同的CPU core可能有着不同的power domain，因而有可能單獨上電。

4. CPU hotplug
hotplug功能，是在處理性能需求不高的情況下，從物理上關閉不需要的CPU core，並在需要的時候，將它們切換為online狀態的一種手段。和cpuidle類似，cpu hotplug也是根據系統負荷，動態調整
處理器性能，從而達到節省功耗的目的。

hotplug與idle的區別：
處於idle狀態的CPU，對調度器來說是可見的，換句話說，調度器並不知道某個CPU是否處於idle狀態，因此不需要對它們做特殊處理。而處於un-hotplugged狀態CPU，對調度器是不可見，因此調度器必
須做一些額外的處理，包括：主動移除CPU，並將該CPU上的中斷等資源遷移到其它CPU上，同時進行必要的負載均衡；反之亦然。

5. 每一個core掉電后，都要檢查該core的sibling core是否都已掉電，如果是，則關閉cluster的供電。

6. cpu ops

對ARM64平台來說，kernel使用struct cpu_operations來抽象cpu ops

struct cpu_operations {
    const char *name;
    int    (*cpu_init)(struct device_node *, unsigned int);
    int    (*cpu_init_idle)(struct device_node *, unsigned int);
    int    (*cpu_prepare)(unsigned int);
    int    (*cpu_boot)(unsigned int);
    void (*cpu_postboot)(void);
#ifdef CONFIG_HOTPLUG_CPU
    int    (*cpu_disable)(unsigned int cpu);
    void (*cpu_die)(unsigned int cpu);
    int    (*cpu_kill)(unsigned int cpu);
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
    int    (*cpu_suspend)(unsigned long);
#endif
};

針對ARM64，kernel提供了兩種可選的方法，smp spin table和psci，如下：

static const struct cpu_operations *supported_cpu_ops[] __initconst = {
#ifdef CONFIG_SMP
    &smp_spin_table_ops,
#endif
    &cpu_psci_ops,
    NULL,
};

具體使用哪一個operation，是通過DTS中的“enable-method”域指定的，DTS格式如下：

cpus {
    ...
        cpu@000 {
            ...
            enable-method = "psci";
            cpu-release-addr = <0x1 0x0000fff8>;
    };
    ...
};

系統初始化的時候，會根據DTS配置獲取使用的operations（setup_arch-->cpu_read_bootcpu_ops-->cpu_read_ops），最終保存在一個cpu_ops數組(每個CPU一個)中，供SMP（arch/arm64/kernel/smp.c）使用，如下：

/* arch/arm64/kernel/cpu_ops.c */
const struct cpu_operations *cpu_ops[NR_CPUS];

三、cpu control & hotplug

1. kernel cpu control位於“./kernel/cpu.c”中，是一個承上啟下的模塊，負責屏蔽arch-dependent的實現細節，向上層軟件提供控制CPU core的統一API（主要包括cpu_up/cpu_down等接口的實現）。

2. cpu的四種狀態

kernel使用4個bitmap，來保存分別處於4種狀態的CPU core：possible、present、active和online。

/* include/linux/cpumask.h */
cpu_possible_mask- has bit 'cpu' set iff cpu is populatable，在啟動時就是固定的，作為CPU ID的集合，可理解為存在這個CPU資源。
cpu_present_mask - has bit 'cpu' set iff cpu is populated，cpu_present_mask是動態的，表示當前插入了哪些CPU，可理解為被kernel接管。
cpu_online_mask  - has bit 'cpu' set iff cpu available to scheduler，pu_online_mask是cpu_present_mask的動態子集，指示可用於調度的CPU。
cpu_active_mask  - has bit 'cpu' set iff cpu available to migration，即是否對調度器可見

如果啟用了HOTPLUG，則將強制cpu_possible_mask設置為所有NR_CPUS位。

2.1 possible CPU
possible的CPUs，代表了系統中可被使用的所有的CPU，在boot階段確定之后，就不會再修改。以ARM64為例，其初始化的過程如下：
(1)系統上電后，boot CPU啟動，執行start_kernel（init/main.c），並分別調用 boot_cpu_init 和 setup_arch 兩個接口，進行possible CPU相關的初始化。
(2)boot_cpu_init負責將當前的boot CPU放到possible CPU的bitmap中，同理，boot CPU也是present、oneline、active CPU。

/* init/main.c */
static void __init boot_cpu_init(void)
{
    int cpu = smp_processor_id(); //用戶獲取當前CPU的ID
    /* Mark the boot cpu "present", "online" etc for SMP and UP case */
    set_cpu_online(cpu, true);
    set_cpu_active(cpu, true);
    set_cpu_present(cpu, true);
    set_cpu_possible(cpu, true);
}

2.2 present CPU
start_kernel —> rest_init —> kernel_init（pid 1，init task） —> kernel_init_freeable -> smp_prepare_cpus”，輪詢所有的possible CPU，如果某個CPU core滿足具備相應的cpu_ops指針，cpu ops的.cpu_prepare回調成功，則調用set_cpu_present()，將其設置為present CPU。

2.3 online CPU
已經boot的CPU，會在 secondary_start_kernel 中，調用 set_cpu_online 接口，將其設置為online狀態。反之，會在__cpu_disable中將其從online mask中清除。

2.4 active CPU
調度器需要監視 CPU hotplug 有關的每一個風吹草動。由於調度器和CPU控制兩個獨立的模塊，kernel 通過 notifier 機制實現這一功能。每當系統的CPU資源有任何變動，kernel CPU control 模塊就會通知調度器，調度器根據相應的event（CPU_DOWN_FAILED、CPU_DOWN_PREPARE等），調用set_cpu_active接口，將某個CPU添加到active mask或者移出active mask。這就是active CPU的意義。

3. 對於支持CPU hotplug功能的平台來說，可以在系統啟動后的任意時刻，關閉任意一個secondary CPU（對ARM平台來說，CPU0或者說boot CPU，是不可以被關閉的），並在需要的時候，再次打開它。

4. 在kernel/cpu.c中，cpu_up 接口，只會在使能了 CONFIG_SMP 配置項（意味着是SMP系統）后才會提供。而cpu_down接口，則只會在使能了 CONFIG_HOTPLUG_CPU 配置項后才會提供。

5. per-CPU 的idle線程

boot CPU在執行初始化動作的時候，會通過“smp_init —> idle_threads_init —> idle_init”的調用，為每個CPU創建一個idle線程，如下：

/* kernel/smpboot.c */
static inline void idle_init(unsigned int cpu)
{
    struct task_struct *tsk = per_cpu(idle_threads, cpu);
    if (!tsk) {
        tsk = fork_idle(cpu);
        if (IS_ERR(tsk))
            pr_err("SMP: fork_idle() failed for CPU %u\n", cpu);
        else
        per_cpu(idle_threads, cpu) = tsk;
    }
}

該接口的本質是，為每個CPU fork一個idle thread（由struct task_struct結構表示），並保存在一個per-CPU的全局變量（idle_threads）中。此時，idle thread只是一個task結構，並沒有執行。

6. 打開和關閉CPU分析

在當前kernel實現中，只支持通過sysfs的形式，關閉或打開CPU：

echo 0 > /sys/devices/system/cpu/cpuX/online  # 關閉CPU
echo 1 > /sys/devices/system/cpu/cpuX/online  # 打開CPU

CPU online 的軟件流程如下：

echo 0 > /sys/devices/system/cpu/cpuX/online 
    online_store(drivers/base/core.c) 
        device_online(drivers/base/core.c) 
            cpu_subsys_online(drivers/base/cpu.c) 
                cpu_up(kernel/cpu.c) 
                    _cpu_up(kernel/cpu.c)

(1) up前后，發送PREPARE、ONLINE、STARTING等notify，以便讓關心者作出相應的動作，例如調度器、RCU、workqueue等模塊，都需要關注CPU的hotplug動作，以便進行任務的重新分配等操作。

(2) 執行Arch-specific相關的boot操作，將CPU boot起來，最終通過 secondary_start_kernel 接口，停留在per-CPU的idle線程上。

_cpu_up 接口會在完成一些准備動作之后，調用平台相關的__cpu_up接口，由平台代碼完成具體的up操作，如下：

static int _cpu_up(unsigned int cpu, int tasks_frozen)
{
    void *hcpu = (void *)(long)cpu;
    unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
    struct task_struct *idle;
    cpu_hotplug_begin();
    idle = idle_thread_get(cpu);
    ret = smpboot_create_threads(cpu);
    ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);
    ret = __cpu_up(cpu, idle);
    /* Wake the per cpu threads */
    smpboot_unpark_threads(cpu);
    /* Now call notifier in preparation. */
    cpu_notify(CPU_ONLINE | mod, hcpu);
}

准備動作包括：
(1) 獲取idle thread的task指針，該指針最終會以參數的形式傳遞給arch-specific代碼。
(2) 創建一個用於管理CPU hotplug動作的線程（smpboot_create_threads），該線程的具體意義，后面會再說明。
(3) 發送CPU_UP_PREPARE notify。

以ARM64為例，__cpu_up 的內部實現如下：

/* arch/arm64/kernel/smp.c */
int __cpu_up(unsigned int cpu, struct task_struct *idle)
{
    int ret;
    /* We need to tell the secondary core where to find its stack and the page tables. */
    secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
    __flush_dcache_area(&secondary_data, sizeof(secondary_data));
    /* Now bring the CPU into our world. */
    ret = boot_secondary(cpu, idle);
    if (ret == 0) {
        /*
        * CPU was successfully started, wait for it to come online or
        * time out.
        */
        wait_for_completion_timeout(&cpu_running, msecs_to_jiffies(1000));
        cpu_online(cpu);
    } else {
        pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
    }
    secondary_data.stack = NULL;
    return ret;
}

該接口以 idle thread 的 task 指針為參數，完成如下動作：
(1) 將idle線程的堆棧，保存在一個名稱為 secondary_data 的全局變量中（這地方很重要，后面再介紹其中的奧妙）。
(2) 執行 boot_secondary 接口，boot CPU，具體的流程。
(3) boot_secondary 返回后，等待對應的CPU切換為online狀態。

secondary_startup 接口位於arch/arm64/kernel/head.S中，負責secondary CPU啟動后的后期操作，如下：

ENTRY(secondary_startup)
    /*
    * Common entry point for secondary CPUs.
    */
    mrs     x22, midr_el1                   // x22=cpuid
    mov     x0, x22
    bl      lookup_processor_type
    mov     x23, x0                         // x23=current cpu_table
    cbz     x23, __error_p                  // invalid processor (x23=0)?

    pgtbl   x25, x26, x28                   // x25=TTBR0, x26=TTBR1
    ldr     x12, [x23, #CPU_INFO_SETUP]
    add     x12, x12, x28                   // __virt_to_phys
    blr     x12                             // initialise processor

    ldr     x21, =secondary_data
    ldr     x27, =__secondary_switched      // address to jump to after enabling the MMU
    b       __enable_mmu
ENDPROC(secondary_startup)

ENTRY(__secondary_switched)
    ldr     x0, [x21]                       // get secondary_data.stack
    mov     sp, x0
    mov     x29, #0
    b       secondary_start_kernel
ENDPROC(__secondary_switched)

我們重點關注上面16~17行，以及21~26行的 __secondary_switched，__secondary_switched 會將保存在 secondary_data 全局變量中的堆棧取出，保存在該CPU的SP中，
並跳轉到 secondary_start_kernel 繼續執行。

CPU啟動后，需要先配置好堆棧，才能進行后續的函數調用，這里使用的是該CPU idle thread的堆棧。看一下kernel中“current”指針（獲取當前task結構的宏定義）的實現方法：

#define current get_current()
#define get_current() (current_thread_info()->task)

static inline struct thread_info *current_thread_info(void)
{
    register unsigned long sp asm ("sp");
    return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}

通過CPU的SP指針，是可以獲得CPU的當前task的。也就是說，當CPU SP被賦值為idle thread的堆棧的那一瞬間，當前的上下文已經是idle thread了！

6. 另外，CPU hotplug 還受“maxcpus”命令行參數影響

系統啟動的時候，可以通過命令行參數“maxcpus”，告知kernel本次啟動所使用的CPU個數，該個數可以小於等於possible CPU的個數。系統初始化時，只會把“maxcpus”所指定個數的CPU置為present狀態
Documentation\cpu-hotplug.txt”文檔是這樣描述的：

maxcpus=n    Restrict boot time cpus to n. Say if you have 4 cpus, using 
             maxcpus=2 will only boot 2. You can choose to bring the 
             other cpus later online, read FAQ's for more info.

注：
內核中經常有這樣的函數，xxx、_xxx 或者 __xxx，區別是一個或者兩個下划線，其中的含義是：
xxx接口，通常需要由某個鎖保護，一般提供給其它模塊調用。它會直接調用_xxx接口；
_xxx接口，則不需要保護，一般由模塊內部在確保安全的情況下調用。有時，外部模塊確信可行（不需要保護），也可能會直接調用；
__xxx接口，一般提供給arch-dependent的軟件層實現，比如這里的arch/arm64/kernel/xxx.c。
理解這些含義后，會加快我們閱讀代碼的速度，另外，如果直接寫代碼，也盡量遵守這樣的原則，以便使自己的代碼更規范、更通用。

參考：

Linux CPU core的電源管理(1)_概述: http://www.wowotech.net/pm_subsystem/cpu_core_pm_overview.html

Linux CPU core的電源管理(2)_cpu topology：http://www.wowotech.net/pm_subsystem/cpu_topology.html

Linux CPU core的電源管理(5)_cpu control及cpu hotplug：http://www.wowotech.net/pm_subsystem/cpu_hotplug.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MLAPP——概率機器學習知識匯總 linux學習（九）Linux知識點匯總 CPU知識機器學習與深度學習：微積分知識匯總機器學習-Pandas 知識點匯總(吐血整理) vue學習之路之需要了解的知識匯總 Spring學習（十）Spring知識點匯總 cpu相關知識 Java集合知識匯總 PostGresql 注入知識匯總