如何區分cpu_scale、max_freq_scale、cpu_orig_capacity、cpu_capacity?


CPU,即中央處理器,它最有用的屬性就是算力性能。通過之前的知識學習,了解了linux kernel中對cpu算力形象化的表示:cpu capacity。

1、從cpu拓撲結構、sched_doamin/sched_group的建立過程來看,就包含了對cpu capcity的初始建立。

2、而cpu的算力和cpu運行的freq又極其相關,因此對cpu調頻的動作,又使cpu capacity發生改變。

3、cpu的算力決定了它能及時處理的task量,最終在給cpu做cfs task placement時,就會參考cpu剩余capcity(去掉irq、dl、rt進程的占用)。

 

上面提到的流程中cpu capacity,從代碼流程一一解析(代碼基於caf-kernel msm-5.4):

在系統開機初始化時,建立CPU拓撲結構,就會根據cpu廠商DTS中配置的參數,解析並作為cpu的算力:

  1. 先讀取dts配置作為raw capacity
cpu0: cpu@000 {
    device_type = "cpu";
...
    capacity-dmips-mhz = <1024>;
...
};

cpu7: cpu@103 {
    device_type = "cpu";
...
    capacity-dmips-mhz = <801>;
...
};
----------------------------------------------------------------
bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
{
...
    ret = of_property_read_u32(cpu_node, "capacity-dmips-mhz",    //解析cpu core算力,kernel4.19后配置該參數
                   &cpu_capacity);
...
        capacity_scale = max(cpu_capacity, capacity_scale);    //記錄最大cpu capacity值作為scale,不能超過scale。因此cpu capacity都是大核1024,小核<1024
        raw_capacity[cpu] = cpu_capacity;                    //raw capacity就是dts中dmips值,其實就是調度中經常使用到的cpu_capacity_orig
        pr_debug("cpu_capacity: %pOF cpu_capacity=%u (raw)\n",
            cpu_node, raw_capacity[cpu]);
...
    return !ret;
}

  2. 這里是將raw capacity進行歸一化,按照最大cpu raw capacity為1024,小的cpu raw capacity按照比例歸一化為1024的小數倍:大核1024,小核***(***<1024)

   然后將歸一化的值,保存為cpu_scale的per_cpu變量

void topology_normalize_cpu_scale(void)
{
    u64 capacity;
    int cpu;

    if (!raw_capacity)
        return;

    pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
    for_each_possible_cpu(cpu) {
        pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
             cpu, raw_capacity[cpu]);
        capacity = (raw_capacity[cpu] << SCHED_CAPACITY_SHIFT)        //就是按照max cpu capacity的100% = 1024的方式歸一化capacity
            / capacity_scale;
        topology_set_cpu_scale(cpu, capacity);                    //更新per_cpu變量cpu_scale為各自的cpu raw capacity
        pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
            cpu, topology_get_cpu_scale(cpu));
    }
}

   3. update_cpu_capacity函數是主要來更新cpu剩余capacity的。從函數中每個部分的計算,也可以看出一些cpu capacity相關計算的端倪。

    • arch_scale_cpu_capacity獲取cpu_scale
    • arch_scale_max_freq_capacity函數展開看下:
      • 可以看到其實就是獲取max_freq_scale, 而它具體如何計算的呢?
        /* Replace task scheduler's default max-frequency-invariant accounting */
        #define arch_scale_max_freq_capacity topology_get_max_freq_scale
        
        static inline
        unsigned long topology_get_max_freq_scale(struct sched_domain *sd, int cpu)
        {
            return per_cpu(max_freq_scale, cpu);
        }
        
        void arch_set_max_freq_scale(struct cpumask *cpus,
                         unsigned long policy_max_freq)
        {
            unsigned long scale, max_freq;
            int cpu = cpumask_first(cpus);
        
            if (cpu > nr_cpu_ids)
                return;
        
            max_freq = per_cpu(max_cpu_freq, cpu);
            if (!max_freq)
                return;
        
            scale = (policy_max_freq << SCHED_CAPACITY_SHIFT) / max_freq;
        
            trace_android_vh_arch_set_freq_scale(cpus, policy_max_freq, max_freq, &scale);
        
            for_each_cpu(cpu, cpus)
                per_cpu(max_freq_scale, cpu) = scale;
        }
      • 從上述arch_set_max_freq_scale函數可知首先獲取max_cpu_freq,再通過如下計算公式得出:
                              policy_max_freq * 1024
        ① max_freq_scale = ——————————————————————————— ,其中policy_max_freq代表當前cpufreq governor(policy)支持的最大freq(會經過PM QoS將所有userspace的request聚合之后得出)
                                  max_cpu_freq    
      •  而max_cpu_freq又是如何確定的呢?其實就是該cpu支持的最大freq:

        {
        ...
        arch_set_freq_scale(policy->related_cpus, new_freq, policy->cpuinfo.max_freq);
        ...
        }
        void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
                     unsigned long max_freq)
        {
            unsigned long scale;
            int i;
        
            scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
        
            trace_android_vh_arch_set_freq_scale(cpus, cur_freq, max_freq, &scale);
        
            for_each_cpu(i, cpus){
                per_cpu(freq_scale, i) = scale;
                per_cpu(max_cpu_freq, i) = max_freq;
            }
        }

        計算公式如下:

        ② max_cpu_freq = max_freq = policy->cpuinfo.max_freq, 其中policy->cpuinfo.max_freq就是該cpu支持的最大freq
    • 將獲取的max_freq_scale進行計算,並考慮thermal限制的情況。結果不能超過thermal限制的最大capacity:
      min(cpu_scale * max_freq_scale / 1024, thermal_cap)
    • 上面計算得到的結果,就是作為cpu_capacity_orig的值
static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
    unsigned long capacity = arch_scale_cpu_capacity(cpu);    //獲取per_cpu變量cpu_scale
    struct sched_group *sdg = sd->groups;

    capacity *= arch_scale_max_freq_capacity(sd, cpu);        //獲取per_cpu變量max_freq_scale,參與計算
    capacity >>= SCHED_CAPACITY_SHIFT;                        //這2步計算為:cpu_scale * max_freq_scale / 1024

    capacity = min(capacity, thermal_cap(cpu));                //計算得出的capacity不能超過thermal限制中的cpu的capacity
    cpu_rq(cpu)->cpu_capacity_orig = capacity;                //將計算得出的capacity作為當前cpu rq的cpu_capacity_orig

    capacity = scale_rt_capacity(cpu, capacity);   //計算cfs rq剩余的cpu capacity

    if (!capacity)            //如果沒有剩余cpu capacity給cfs了,那么就強制寫為1
        capacity = 1;

    cpu_rq(cpu)->cpu_capacity = capacity;        //更新相關sgc capacity:cpu rq的cpu_capacity、sgc的最大/最小的capacity
    sdg->sgc->capacity = capacity;
    sdg->sgc->min_capacity = capacity;
    sdg->sgc->max_capacity = capacity;
}
    • 之后再通過scale_rt_capacity函數,從cpu_capacity_orig中減去irq、dl class和rt class的util_avg占用之后,得到剩余的cpu capacity就是留給cfs進程的。公式如下:
                         (cpu_capacity_orig - avg_rt.util_avg - avg_dl.util_avg) * (cpu_capacity_orig - avg_irq.util_avg)
      cpu_capacity = ————————————————————————————————————————————————————————————————————————————————————————————————————
                                                              cpu_capacity_orig
static unsigned long scale_rt_capacity(int cpu, unsigned long max)
{
    struct rq *rq = cpu_rq(cpu);
    unsigned long used, free;
    unsigned long irq;

    irq = cpu_util_irq(rq);            //獲取cpu rq的irq util_avg

    if (unlikely(irq >= max))        //如果util_avg超過max,則說明util滿了?
        return 1;

    used = READ_ONCE(rq->avg_rt.util_avg);        //獲取rt task rq的util_avg
    used += READ_ONCE(rq->avg_dl.util_avg);        //獲取並累加dl task rq的util_avg

    if (unlikely(used >= max))        //如果util_avg超過max,則說明util滿了?
        return 1;

    free = max - used;        //計算free util = 最大capacity - rt的util_avg - dl的util_avg

    return scale_irq_capacity(free, irq, max);    //(max - rt的util_avg - dl的util_avg) * (max - irq) /max
}
    • 最后計算結果(剩余)作為cpu_capacity,以及sgc->capacity

整體的依賴框架如下:

 

 

 

總結:

  1. max_cpu_freq:等於policy->cpuinfo.max_freq,也就是cpu支持的最大freq。------因為cpu的最大支持freq是一個固定值。所以,max_cpu_scale與cpu當前的freq無關
  2. max_freq_scale:根據max_cpu_scale、cpufreq governor policy的最大支持freq,計算並歸一化得出。-------policy與cpufreq選擇governor policy的最大支持freq有關。
  3. cpu_scale:根據DTS中配置的cpu算力值決定。------這個值是cpu固有算力的體現,一般有cpu廠商決定和配置,所以一般是固定的。
  4. cpu_capacity_orig:根據max_freq_scalecpu_scale計算,並考慮thermal限制后的結果。--------這其實代表了cpu在當前的policy下最大的算力體現
  5. cpu_capacity:根據cpu_capacity_orig,計算去掉irq、rt、dl的占用后,剩余的capacity。---------首先,irq的響應處理會占用cpu(更甚地,如果irq太頻繁觸發,會影響系統性能);rt、dl class的進程的優先級都是高於cfs task的;所以,從當前cpu當前狀態的算力中,去掉了irq、rt、dl的算力占用,那么剩余的就是留給cfs task的cpu算力了。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM