測試環境:AOSP 7.1.1+Kernel 4.4.17 HW:HiKey
 
                     Ubuntu 14.04+Kernel 4.4.0-31
 
        聯系方式:arnoldlu@qq.com
 
        1. Linux內核suspend狀態
 
        Linux內核支持多種類型的睡眠狀態,通過設置不同的模塊進入低功耗模式來達到省電功能。
 
        目前存在四種模式:suspend to idle、power-on standby(Standby)、suspend to ram(STR)和sudpend to disk(Hibernate),分別對應ACPI狀態的S0、S1、S3和S4。
 
         
          
           
           | State in Linux |  
           Label |  
           state |  
           ACPI state |  
           注釋 |  
          
 
           
           | #define PM_SUSPEND_ON        ((__force suspend_state_t) 0) |  
             |  
             |  
             |  
            一切正常 |  
          
 
           
           | #define PM_SUSPEND_FREEZE    ((__force suspend_state_t) 1) |  
           freeze |  
           Suspend-to-Idle |  
           S0 |  
           凍結進程+掛起設備+CPU空閑  |  
          
 
           
           | #define PM_SUSPEND_STANDBY    ((__force suspend_state_t) 2) |  
           standby |  
           Standby/Power-on Suspend |  
           S1 |  
           凍結進程+掛起設備+關閉nonbootCPU  |  
          
 
           
           | #define PM_SUSPEND_MEM        ((__force suspend_state_t) 3) |  
           mem |  
           Suspend-to-RAM |  
           S3 |  
           僅保留RAM自刷新  |  
          
 
           
           | #define PM_SUSPEND_MAX        ((__force suspend_state_t) 4) |  
           disk |  
           Suspend-to-disk |  
           S4 |  
             關閉所有設備包括RAM,也被稱為Hibernate   |  
          
 
          
        
 
         
         
        從freeze-->standby-->mem睡眠程度越來越深,喚醒花費的時間也越來越多。
 
        Suspend-To-Idle
 
        此狀態包括frozen processes+suspended devices+idle processors,具有輕量化的特點;
 
        並且相對於相對於Idle狀態能節省更多的功耗,因為此時的用戶空間被凍結且I/O設備進入了低功耗狀態。
 
        相對於Suspend-To-RAM它具有低延時的優勢。
 
        Standby/Power-On Suspend
 
        此狀態包括frozen processes+suspended devices+offline nonboot CPUs+suspend low-level system,對CPU的處理更近一步。
 
        所以相對於Suspend-To-Idle節省了更多的功耗,但是由於需要恢復CPU和一些底層功能也花費了更多的時間。
 
        Suspend-to-RAM
 
         
         
        此狀態使所有的設備進入低功耗狀態,僅保留RAM自刷新。
 
        所有的設備和系統狀態都保存在RAM中,所有外設被掛起。
 
        (在HiKey的實際測試中,boot CPU是沒有關閉的!實際上這里也沒有standby,mem和standby基本上沒有區別。)
 
        Suspend-to-disk
 
        此狀態是最省功耗的模式。
 
        相對Suspend-to-RAMRAM能節省更多功耗的原因是數據會被寫入磁盤中,RAM也可以被關閉。
 
        但是這也導致了,更多的恢復延時,在resume的時候讀回到RAM,然后在進行系統和設備狀態恢復工作。
 
        但是在一般的嵌入式設備上,此種狀態不支持。
 
         
 
        下面用STR表示Suspend to RAM,STI表示Suspend to Idle。
 
         
         
        詳情請參考:http://www.linaro.org/blog/suspend-to-idle/
 
        2. Suspend狀態,以及STR 和STI區別
 
        寫入/sys/power/state不同字符串,可以讓系統進入不同睡眠狀態。
 
        針對state sysfs節點的寫入,最終會進入到state_store這個函數,將字符串轉換成上表中不同狀態。
 
         
         state_store(kernel/power/main.c)
    -->pm_suspend (kernel/power/suspend.c)-------------處理除freeze、standby、mem三種類型suspend
        -->enter_state---------------------------------在進入睡眠之前,做一些准備工作
            -->suspend_devices_and_enter
                -->suspend_enter-----------------------這里才是freeze與standby/mem區別所在。
    -->hibernate---------------------------------------進入suspend to disk流程 
          
         
 
        STR和STI的最主要區別就是下面一段代碼:
 
         
         static int suspend_enter(suspend_state_t state, bool *wakeup)
{
…
    /*
     * PM_SUSPEND_FREEZE equals
     * frozen processes + suspended devices + idle processors.
     * Thus we should invoke freeze_enter() soon after
     * all the devices are suspended.
     */
//====================================FREEZE===============================================================
    if (state == PM_SUSPEND_FREEZE) {------------------------------------如果要進入freeze狀態,就會執行此段代碼。
        trace_suspend_resume(TPS("machine_suspend"), state, true);
        freeze_enter();
        trace_suspend_resume(TPS("machine_suspend"), state, false);
        goto Platform_wake;----------------------------------------------在執行結束跳轉到Platform_wake,中間一段綠色代碼將會被跳過。所以說freeze和standby、mem相比,多了freeze_enter,少了對non-boot CPUs、arch、syscore的操作。
    }
//=====================================MEM===============================================================     error = disable_nonboot_cpus();
    if (error || suspend_test(TEST_CPUS)) {
        log_suspend_abort_reason("Disabling non-boot cpus failed");
        goto Enable_cpus;
    }
    arch_suspend_disable_irqs();
    BUG_ON(!irqs_disabled());
    error = syscore_suspend();
    if (!error) {
        *wakeup = pm_wakeup_pending();
        if (!(suspend_test(TEST_CORE) || *wakeup)) {
            trace_suspend_resume(TPS("machine_suspend"),
                state, true);
            error = suspend_ops->enter(state);
            trace_suspend_resume(TPS("machine_suspend"),
                state, false);
            events_check_enabled = false;
        } else if (*wakeup) {
            pm_get_active_wakeup_sources(suspend_abort,
                MAX_SUSPEND_ABORT_LEN);
            log_suspend_abort_reason(suspend_abort);
            error = -EBUSY;
        }
        syscore_resume();
    }
    arch_suspend_enable_irqs();
    BUG_ON(irqs_disabled());
Enable_cpus:
    enable_nonboot_cpus();
Platform_wake:
    platform_resume_noirq(state);
    dpm_resume_noirq(PMSG_RESUME);
…
} 
          
         
         
         
 
        3 suspend/resume流程梳理
 
        下面分析一下suspend/resume每個細分階段。
 
        整個suspend可以分為若干階段,每個階段函數—>關鍵節點Trace—>analyze_suspend.py解析Trace—>根據Trace時間畫出Timeline圖表
 
        這樣就可以分析出總的時間差異,每個階段差異,甚至一個設備suspend/resume、一個子系統suspend/resume的時間差異。
 
        analyze_suspend.py 基於默認基於ftrace進行分析(在指定dmesg的時候,會發現缺失了很多log信息,無法生成timeline類型的html文件),將suspend/resume分為若干階段。
 
        下面簡要介紹一下各個階段,然后基於此進行代碼分析。
 
        在kernel版本大於等與3.15之后,解析需要的所有log信息都可以從ftrace中獲取。之前的內核版本還需要借助於dmesg。
 
        由於使用的kernel版本是4.4.17,sysvals.usetraceeventsonly被置位,所以只會parseTraceLog()。
 
        下表中的各個階段通過解析suspend_resume: XXXXXXX類型的ftrace來獲取。
 
        各子模塊、子系統的解析通過device_pm_callback_start和device_pm_callback_end來截取時間段,以及這時間段內的callgraph。
 
         
          
           
           | Phase名稱 |  
           ftrace關鍵詞 |  
             |  
             |  
          
 
           
           | suspend_prepare |  
           dpm_prepare |  
             |  
             |  
          
 
           
           | suspend |  
           dpm_suspend |  
             |  
             |  
          
 
           
           | suspend_late |  
           dpm_suspend_late |  
             |  
             |  
          
 
           
           | suspend_noirq |  
           dpm_suspend_noirq |  
             |  
             |  
          
 
           
           | suspend_machine |  
           machine_suspend start |  
             |  
             |  
          
 
           
           | resume_machine |  
           machine_suspend end |  
             |  
             |  
          
 
           
           | resume_noirq |  
           dpm_resume_noirq |  
             |  
             |  
          
 
           
           | resume_early |  
           dpm_resume_early |  
             |  
             |  
          
 
           
           | resume |  
           dpm_resume |  
             |  
             |  
          
 
           
           | resume_complete |  
           dpm_complete |  
             |  
             |  
          
 
          
        
 
        下面是一組suspend/resume執行ftrace log,我們將據此進行各階段代碼分析,包括suspend_enter、suspend_prepare、suspend、suspend_late、suspend_noirq、suspend_machine、resume_machine、resume_noirq、resume_early、resume、resume_complete。
 
        從這里也可以看出freeze和mem/standby除了machine部分不同之外,還少了CPU開關和syscore suspend/resume操作。
 
         
          
           
           |   suspend_resume: suspend_enter[1] begin suspend_resume: sync_filesystems[0] begin suspend_resume: sync_filesystems[0] end suspend_resume: freeze_processes[0] begin suspend_resume: freeze_processes[0] end suspend_resume: suspend_enter[1] end suspend_resume: dpm_prepare[2] begin suspend_resume: dpm_prepare[2] end suspend_resume: dpm_suspend[2] begin suspend_resume: dpm_suspend[2] end suspend_resume: dpm_suspend_late[2] begin suspend_resume: dpm_suspend_late[2] end suspend_resume: dpm_suspend_noirq[2] begin suspend_resume: dpm_suspend_noirq[2] end No CPU_OFF… No syscore_suspend… suspend_resume: machine_suspend[1] begin suspend_resume: machine_suspend[1] end No suscore_resume… No CPU_ON… suspend_resume: dpm_resume_noirq[16] begin suspend_resume: dpm_resume_noirq[16] end suspend_resume: dpm_resume_early[16] begin suspend_resume: dpm_resume_early[16] end suspend_resume: dpm_resume[16] begin suspend_resume: dpm_resume[16] end suspend_resume: dpm_complete[16] begin suspend_resume: dpm_complete[16] end suspend_resume: resume_console[1] begin suspend_resume: resume_console[1] end suspend_resume: thaw_processes[0] begin suspend_resume: thaw_processes[0] end   |  
             suspend_resume: suspend_enter[3] begin suspend_resume: sync_filesystems[0] begin suspend_resume: sync_filesystems[0] end suspend_resume: freeze_processes[0] begin suspend_resume: freeze_processes[0] end suspend_resume: suspend_enter[3] end suspend_resume: dpm_prepare[2] begin suspend_resume: dpm_prepare[2] end suspend_resume: dpm_suspend[2] begin suspend_resume: dpm_suspend[2] end suspend_resume: dpm_suspend_late[2] begin suspend_resume: dpm_suspend_late[2] end suspend_resume: dpm_suspend_noirq[2] begin suspend_resume: dpm_suspend_noirq[2] end suspend_resume: CPU_OFF[1-7] begin/end suspend_resume: syscore_suspend[0] begin/end suspend_resume: machine_suspend[3] begin suspend_resume: machine_suspend[3] end suspend_resume: syscore_resume[0] begin/end suspend_resume: CPU_ON[1-7] begin/end suspend_resume: dpm_resume_noirq[16] begin suspend_resume: dpm_resume_noirq[16] end suspend_resume: dpm_resume_early[16] begin suspend_resume: dpm_resume_early[16] end suspend_resume: dpm_resume[16] begin suspend_resume: dpm_resume[16] end suspend_resume: dpm_complete[16] begin suspend_resume: dpm_complete[16] end suspend_resume: resume_console[3] begin suspend_resume: resume_console[3] end suspend_resume: thaw_processes[0] begin suspend_resume: thaw_processes[0] end   |  
          
 
          
        
 
        在介紹相關代碼之前,先介紹一下HiKey使用的platform_suspend_ops:
 
         
          
           
           |   static const struct platform_suspend_ops psci_suspend_ops = {     .valid          = suspend_valid_only_mem,  僅支持mem類型的suspend     .enter          = psci_system_suspend_enter,  睡眠的CPU底層支持 };   |  
          
 
          
        
 
        freeze的platform_freeze_ops如下:
 
         
          
           
           |   static const struct platform_freeze_ops acpi_freeze_ops = {     .begin = acpi_freeze_begin,     .prepare = acpi_freeze_prepare,     .restore = acpi_freeze_restore,     .end = acpi_freeze_end, };   |  
          
 
          
        
 
        3.1 suspend_enter
 
        enter_state作為suspend/resume的入口點,完成了絕大部分工作。首先確保系統沒有正在進入睡眠狀態;然后為suspend做一些准備,使系統進入睡眠並在喚醒后進行必要清理恢復工作。
 
        下面分析一下suspend之前的准備工作,即suspend_enter階段:
 
         
         static int enter_state(suspend_state_t state)
{
    int error;
 trace_suspend_resume(TPS("suspend_enter"), state, true);
    if (state == PM_SUSPEND_FREEZE) {--------------------------------------是否是freeze類型suspend
#ifdef CONFIG_PM_DEBUG
        if (pm_test_level != TEST_NONE && pm_test_level <= TEST_CPUS) {
            pr_warning("PM: Unsupported test mode for suspend to idle,"
                   "please choose none/freezer/devices/platform.\n");
            return -EAGAIN;
        }
#endif
    } else if (!valid_state(state)) {-------------------------------------目前只支持mem類型suspend
        return -EINVAL;
    }
    if (!mutex_trylock(&pm_mutex))
        return -EBUSY;
    if (state == PM_SUSPEND_FREEZE)
        freeze_begin();--------------------------------------------------初始化suspend_freeze_state為FREEZE_STATE_NONE
#ifndef CONFIG_SUSPEND_SKIP_SYNC
    trace_suspend_resume(TPS("sync_filesystems"), 0, true);
    printk(KERN_INFO "PM: Syncing filesystems ... ");
    sys_sync();----------------------------------------------------------sync文件系統緩存文件,確保數據sync到硬盤。
    printk("done.\n");
    trace_suspend_resume(TPS("sync_filesystems"), 0, false);
#endif
    pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
    pm_suspend_clear_flags();
    error = suspend_prepare(state);--------------------------------------注意這里面的suspend_prepare和下面的suspend_prepare階段容易搞混。
    if (error)
        goto Unlock;
    if (suspend_test(TEST_FREEZER))
        goto Finish;
 trace_suspend_resume(TPS("suspend_enter"), state, false);     pr_debug("PM: Suspending system (%s)\n", pm_states[state]);
    pm_restrict_gfp_mask();
    error = suspend_devices_and_enter(state);
    pm_restore_gfp_mask();
Finish:
    pr_debug("PM: Finishing wakeup.\n");
    suspend_finish();---------------------------------------------------解凍,重啟進程;發送PM_POST_SUSPEND通知;釋放之前分配的console。
Unlock:
    mutex_unlock(&pm_mutex);
    return error;
} 
          
         
 
        接着分析一下suspend_prepare函數:
 
         
          
           
           |   static int suspend_prepare(suspend_state_t state) {     int error;      if (!sleep_state_supported(state))  驗證suspend狀態         return -EPERM;      pm_prepare_console();  分配一個suspend console      error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);  發送PM_SUSPEND_PREPARE通知消息     if (error)         goto Finish;      trace_suspend_resume(TPS("freeze_processes"), 0, true);     error = suspend_freeze_processes();  凍結進程     trace_suspend_resume(TPS("freeze_processes"), 0, false);     if (!error)         return 0;      suspend_stats.failed_freeze++;     dpm_save_failed_step(SUSPEND_FREEZE); Finish:     pm_notifier_call_chain(PM_POST_SUSPEND);     pm_restore_console();     return error; }   |  
          
 
          
        
 
        suspend_freeze_process先處理用戶空間進程,然后處理內核進程:
 
         
          
           
           |   static inline int suspend_freeze_processes(void) {     int error;      error = freeze_processes();  觸發用戶空間進程進入freeze狀態。當前進程不會被凍結。因為凍結失敗的進程會自動被解凍,所以不需要進行錯誤處理。     /*      * freeze_processes() automatically thaws every task if freezing      * fails. So we need not do anything extra upon error.      */     if (error)         return error;      error = freeze_kernel_threads();  凍結內核線程     /*      * freeze_kernel_threads() thaws only kernel threads upon freezing      * failure. So we have to thaw the userspace tasks ourselves.      */     if (error)  由於freeze_kernel_threads凍結失敗,只會解凍內核線程。所以還需要對用戶空間進程進行解凍。         thaw_processes();      return error; }   |  
          
 
          
        
 
         
 
        下面的階段都在suspend_devices_and_enter中,可以看出這是一個對稱的流程,每一階段的suspend,都有對應的resume。
 
         
          
           
           |   int suspend_devices_and_enter(suspend_state_t state) {     int error;     bool wakeup = false;      if (!sleep_state_supported(state))         return -ENOSYS;      error = platform_suspend_begin(state);     if (error)         goto Close;      suspend_console();  關閉console子系統,暫停printk打印     suspend_test_start();     error = dpm_suspend_start(PMSG_SUSPEND); suspend_prepare(dpm_prepare)、suspend(dpm_suspend)兩階段     if (error) {         pr_err("PM: Some devices failed to suspend, or early wake event detected\n");         log_suspend_abort_reason("Some devices failed to suspend, or early wake event detected");         goto Recover_platform;     }     suspend_test_finish("suspend devices");     if (suspend_test(TEST_DEVICES))         goto Recover_platform;      do {         error = suspend_enter(state, &wakeup);  suspend_late(dpm_suspend_late)、suspend_noirq(dpm_suspend_noirq)、suspend_machine、resume_machine、resume_noirq(dpm_resume_noirq)、resume_early(dpm_resume_early)     } while (!error && !wakeup && platform_suspend_again(state));  Resume_devices:     suspend_test_start();     dpm_resume_end(PMSG_RESUME);  resume(dpm_resume)、resume_complete(dpm_complete)     suspend_test_finish("resume devices");     trace_suspend_resume(TPS("resume_console"), state, true);     resume_console();  打開console子系統,恢復printk打印。     trace_suspend_resume(TPS("resume_console"), state, false);  Close:     platform_resume_end(state);     return error;  Recover_platform:     platform_recover(state);     goto Resume_devices; }   |  
          
 
          
        
 
        還有必要過一下suspend_enter:
 
         
          
           
           |   static int suspend_enter(suspend_state_t state, bool *wakeup) {     char suspend_abort[MAX_SUSPEND_ABORT_LEN];     int error, last_dev;      error = platform_suspend_prepare(state); 因為suspend_ops的prepare為空,所以返回0     if (error)         goto Platform_finish;      error = dpm_suspend_late(PMSG_SUSPEND);  suspend_late     if (error) {         last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;         last_dev %= REC_FAILED_NUM;         printk(KERN_ERR "PM: late suspend of devices failed\n");         log_suspend_abort_reason("%s device failed to power down",             suspend_stats.failed_devs[last_dev]);         goto Platform_finish;     }     error = platform_suspend_prepare_late(state);  執行freeze_ops->prepare()     if (error)         goto Devices_early_resume;      error = dpm_suspend_noirq(PMSG_SUSPEND);  suspend_noirq     if (error) {         last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;         last_dev %= REC_FAILED_NUM;         printk(KERN_ERR "PM: noirq suspend of devices failed\n");         log_suspend_abort_reason("noirq suspend of %s device failed",             suspend_stats.failed_devs[last_dev]);         goto Platform_early_resume;     }     error = platform_suspend_prepare_noirq(state);     if (error)         goto Platform_wake;      if (suspend_test(TEST_PLATFORM))         goto Platform_wake;      /*      * PM_SUSPEND_FREEZE equals      * frozen processes + suspended devices + idle processors.      * Thus we should invoke freeze_enter() soon after      * all the devices are suspended.      */     if (state == PM_SUSPEND_FREEZE) {  這里是freeze和mem/standy差別所在         trace_suspend_resume(TPS("machine_suspend"), state, true);         freeze_enter();           trace_suspend_resume(TPS("machine_suspend"), state, false);         goto Platform_wake;     }      error = disable_nonboot_cpus(); 關閉所有boot-CPU之外的CPU     if (error || suspend_test(TEST_CPUS)) {         log_suspend_abort_reason("Disabling non-boot cpus failed");         goto Enable_cpus;     }      arch_suspend_disable_irqs();     BUG_ON(!irqs_disabled());      error = syscore_suspend();  執行syscore_ops_list上所有syscore_ops的suspend回調函數     if (!error) {         *wakeup = pm_wakeup_pending();  檢查是否需要終止suspend流程?         if (!(suspend_test(TEST_CORE) || *wakeup)) {             trace_suspend_resume(TPS("machine_suspend"),                 state, true);             error = suspend_ops->enter(state);  調用psci_suspend_ops的enter回調函數,關閉machine             trace_suspend_resume(TPS("machine_suspend"),                 state, false);  !!!!!!!!!!!!!!!!這里即為喚醒之后的執行路徑了!!!!!!!!!!!!!!!!             events_check_enabled = false;         } else if (*wakeup) {             pm_get_active_wakeup_sources(suspend_abort,                 MAX_SUSPEND_ABORT_LEN);             log_suspend_abort_reason(suspend_abort);             error = -EBUSY;         }         syscore_resume();  執行所有syscore_ops_list的resume回調函數     }      arch_suspend_enable_irqs();     BUG_ON(irqs_disabled());  Enable_cpus:     enable_nonboot_cpus();  打開所有non-boot CPU  Platform_wake:     platform_resume_noirq(state);     dpm_resume_noirq(PMSG_RESUME);  resume_noirq  Platform_early_resume:     platform_resume_early(state);  Devices_early_resume:     dpm_resume_early(PMSG_RESUME);  resume_early  Platform_finish:     platform_resume_finish(state);     return error; }   |  
          
 
          
        
 
        3.2 suspend_prepare和suspend
 
        DPM是Device Power Management的意思,這些操作都是針對非系統設備(non-sysdev)進行的。那什么是系統設備呢?下面的machine應該就是所謂的sysdev了。
 
        dpm_prepare實際上就是遍歷dpm_list上的所有設備,執行->prepare回調函數。如果設備存在->prepare回電函數,會將設備的prepare階段打印到ftrace。
 
         
          
           
           |   int dpm_prepare(pm_message_t state) {     int error = 0;      trace_suspend_resume(TPS("dpm_prepare"), state.event, true);     might_sleep();      mutex_lock(&dpm_list_mtx);     while (!list_empty(&dpm_list)) {  遍歷dpm_list         struct device *dev = to_device(dpm_list.next);          get_device(dev);         mutex_unlock(&dpm_list_mtx);          trace_device_pm_callback_start(dev, "", state.event);         error = device_prepare(dev, state);  執行->prepare回調函數         trace_device_pm_callback_end(dev, error);          mutex_lock(&dpm_list_mtx);         if (error) {             if (error == -EAGAIN) {                 put_device(dev);                 error = 0;                 continue;             }             printk(KERN_INFO "PM: Device %s not prepared "                 "for power transition: code %d\n",                 dev_name(dev), error);             put_device(dev);             break;         }         dev->power.is_prepared = true;         if (!list_empty(&dev->power.entry))             list_move_tail(&dev->power.entry, &dpm_prepared_list);  移動設備到dpm_prepared_list         put_device(dev);     }     mutex_unlock(&dpm_list_mtx);     trace_suspend_resume(TPS("dpm_prepare"), state.event, false);     return error; }   |  
          
 
          
        
 
        dpm_suspend遍歷dpm_prepared_list,這點和dpm_prepare有區別。然后執行設備的->suspend回調函數。
 
         
          
           
           |   int dpm_suspend(pm_message_t state) {     ktime_t starttime = ktime_get();     int error = 0;      trace_suspend_resume(TPS("dpm_suspend"), state.event, true);     might_sleep();      cpufreq_suspend();      mutex_lock(&dpm_list_mtx);     pm_transition = state;     async_error = 0;     while (!list_empty(&dpm_prepared_list)) { 基於dpm_prepared_list遍歷設備         struct device *dev = to_device(dpm_prepared_list.prev);          get_device(dev);         mutex_unlock(&dpm_list_mtx);          error = device_suspend(dev);  執行設備->suspend回調函數          mutex_lock(&dpm_list_mtx);         if (error) {             pm_dev_err(dev, state, "", error);             dpm_save_failed_dev(dev_name(dev));             put_device(dev);             break;         }         if (!list_empty(&dev->power.entry))             list_move(&dev->power.entry, &dpm_suspended_list);  移動設備到dpm_suspended_list         put_device(dev);         if (async_error)             break;     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     if (!error)         error = async_error;     if (error) {         suspend_stats.failed_suspend++;         dpm_save_failed_step(SUSPEND_SUSPEND);     } else         dpm_show_time(starttime, state, NULL);     trace_suspend_resume(TPS("dpm_suspend"), state.event, false);     return error; }   |  
          
 
          
        
 
        3.3 suspend_late和suspend_noirq
 
        dpm_suspend_late基於dpm_suspended_list操作設備,所以這也需要函數之間順序執行。
 
         
          
           
           |   int dpm_suspend_late(pm_message_t state) {     ktime_t starttime = ktime_get();     int error = 0;      trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);     mutex_lock(&dpm_list_mtx);     pm_transition = state;     async_error = 0;      while (!list_empty(&dpm_suspended_list)) {  遍歷dpm_suspended_list列表         struct device *dev = to_device(dpm_suspended_list.prev);          get_device(dev);         mutex_unlock(&dpm_list_mtx);          error = device_suspend_late(dev);  執行->suspend_late回調函數          mutex_lock(&dpm_list_mtx);         if (!list_empty(&dev->power.entry))             list_move(&dev->power.entry, &dpm_late_early_list);  移動設備到dpm_late_early_list          if (error) {             pm_dev_err(dev, state, " late", error);             dpm_save_failed_dev(dev_name(dev));             put_device(dev);             break;         }         put_device(dev);          if (async_error)             break;     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     if (!error)         error = async_error;     if (error) {         suspend_stats.failed_suspend_late++;         dpm_save_failed_step(SUSPEND_SUSPEND_LATE);         dpm_resume_early(resume_event(state));     } else {         dpm_show_time(starttime, state, "late");     }     trace_suspend_resume(TPS("dpm_suspend_late"), state.event, false);     return error; }   |  
          
 
          
        
 
        dpm_suspend_noirq基於dpm_late_early_list遍歷所有設備。首先阻止設備驅動接收中斷信息,然后執行->suspend_noirq回調函數。
 
         
          
           
           |   int dpm_suspend_noirq(pm_message_t state) {     ktime_t starttime = ktime_get();     int error = 0;      trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);     cpuidle_pause();  暫停cpuidle功能,退出idle的CPU     device_wakeup_arm_wake_irqs();  將具有wakeirq的設備設置成wakeup resource     suspend_device_irqs();  關閉當前所有能夠關閉的irq,置成IRQS_SUSPENDED。IRQF_NO_SUSPEND類型的wakeup中斷不能被關閉,並且作為wakeup喚醒源的中斷不能被關閉。     mutex_lock(&dpm_list_mtx);     pm_transition = state;     async_error = 0;      while (!list_empty(&dpm_late_early_list)) {         struct device *dev = to_device(dpm_late_early_list.prev);          get_device(dev);         mutex_unlock(&dpm_list_mtx);          error = device_suspend_noirq(dev);  調用->suspend_noirq回調函數          mutex_lock(&dpm_list_mtx);         if (error) {             pm_dev_err(dev, state, " noirq", error);             dpm_save_failed_dev(dev_name(dev));             put_device(dev);             break;         }         if (!list_empty(&dev->power.entry))             list_move(&dev->power.entry, &dpm_noirq_list);  移動設備到dpm_noirq_list         put_device(dev);          if (async_error)             break;     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     if (!error)         error = async_error;      if (error) {         suspend_stats.failed_suspend_noirq++;         dpm_save_failed_step(SUSPEND_SUSPEND_NOIRQ);         dpm_resume_noirq(resume_event(state));     } else {         dpm_show_time(starttime, state, "noirq");     }     trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, false);     return error; }   |  
          
 
          
        
 
        3.4 suspend_machine和resume_machine
 
        freeze和mem/standby在這部分是不同的。
 
        mem/standby直接調用suspend_ops->enter進入對應的睡眠模式。
 
        而freeze就要稍微復雜了:
 
         
          
           
           |   static void freeze_enter(void) {     spin_lock_irq(&suspend_freeze_lock);     if (pm_wakeup_pending())  檢查是否有wakeup信號在處理,如果有則退出當前流程。         goto out;      suspend_freeze_state = FREEZE_STATE_ENTER;     spin_unlock_irq(&suspend_freeze_lock);      get_online_cpus();     cpuidle_resume();  允許使用cpuidle      /* Push all the CPUs into the idle loop. */     wake_up_all_idle_cpus();  強制所有CPU退出idle狀態     pr_debug("PM: suspend-to-idle\n");     /* Make the current CPU wait so it can enter the idle loop too. */     wait_event(suspend_freeze_wait_head,            suspend_freeze_state == FREEZE_STATE_WAKE);  等待FREEZE_STATE_WAKE事件,進入idle loop     pr_debug("PM: resume from suspend-to-idle\n");  !!!!!!!!!!!!!!!!這里即為喚醒之后的執行路徑了!!!!!!!!!!!!!!!!     cpuidle_pause();  暫停使用cpuidle     put_online_cpus();      spin_lock_irq(&suspend_freeze_lock);  out:     suspend_freeze_state = FREEZE_STATE_NONE;     spin_unlock_irq(&suspend_freeze_lock); }   |  
          
 
          
        
 
         
 
        3.5 resume_noirq
 
        執行dpm_noirq_list上設備的resume_noirq回調函數。
 
         
          
           
           |   void dpm_resume_noirq(pm_message_t state) {     struct device *dev;     ktime_t starttime = ktime_get();      trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, true);     mutex_lock(&dpm_list_mtx);     pm_transition = state;      /*      * Advanced the async threads upfront,      * in case the starting of async threads is      * delayed by non-async resuming devices.      */     list_for_each_entry(dev, &dpm_noirq_list, power.entry) {         reinit_completion(&dev->power.completion);         if (is_async(dev)) {             get_device(dev);             async_schedule(async_resume_noirq, dev);         }     }      while (!list_empty(&dpm_noirq_list)) {  遍歷dpm_noirq_list         dev = to_device(dpm_noirq_list.next);         get_device(dev);         list_move_tail(&dev->power.entry, &dpm_late_early_list);  移動設備到下一級dpm_late_early_list         mutex_unlock(&dpm_list_mtx);          if (!is_async(dev)) {             int error;              error = device_resume_noirq(dev, state, false);             if (error) {                 suspend_stats.failed_resume_noirq++;                 dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);                 dpm_save_failed_dev(dev_name(dev));                 pm_dev_err(dev, state, " noirq", error);             }         }          mutex_lock(&dpm_list_mtx);         put_device(dev);     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     dpm_show_time(starttime, state, "noirq");     resume_device_irqs();     device_wakeup_disarm_wake_irqs();     cpuidle_resume();     trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, false); }   |  
          
 
          
        
 
        3.6 resume_early
 
        執行前述dpm_late_early_list設備的resume_early回調函數,移動設備到dpm_suspended_list列表。
 
         
          
           
           |   void dpm_resume_early(pm_message_t state) {     struct device *dev;     ktime_t starttime = ktime_get();      trace_suspend_resume(TPS("dpm_resume_early"), state.event, true);     mutex_lock(&dpm_list_mtx);     pm_transition = state;      /*      * Advanced the async threads upfront,      * in case the starting of async threads is      * delayed by non-async resuming devices.      */     list_for_each_entry(dev, &dpm_late_early_list, power.entry) {         reinit_completion(&dev->power.completion);         if (is_async(dev)) {             get_device(dev);             async_schedule(async_resume_early, dev);         }     }      while (!list_empty(&dpm_late_early_list)) {         dev = to_device(dpm_late_early_list.next);         get_device(dev);         list_move_tail(&dev->power.entry, &dpm_suspended_list);         mutex_unlock(&dpm_list_mtx);          if (!is_async(dev)) {             int error;              error = device_resume_early(dev, state, false);             if (error) {                 suspend_stats.failed_resume_early++;                 dpm_save_failed_step(SUSPEND_RESUME_EARLY);                 dpm_save_failed_dev(dev_name(dev));                 pm_dev_err(dev, state, " early", error);             }         }         mutex_lock(&dpm_list_mtx);         put_device(dev);     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     dpm_show_time(starttime, state, "early");     trace_suspend_resume(TPS("dpm_resume_early"), state.event, false); }   |  
          
 
          
        
 
        3.7 resume
 
        執行所有dpm_suspended_list上設備的resume回調函數。
 
         
          
           
           |   void dpm_resume(pm_message_t state) {     struct device *dev;     ktime_t starttime = ktime_get();      trace_suspend_resume(TPS("dpm_resume"), state.event, true);     might_sleep();      mutex_lock(&dpm_list_mtx);     pm_transition = state;     async_error = 0;      list_for_each_entry(dev, &dpm_suspended_list, power.entry) {         reinit_completion(&dev->power.completion);         if (is_async(dev)) {             get_device(dev);             async_schedule(async_resume, dev);         }     }      while (!list_empty(&dpm_suspended_list)) {         dev = to_device(dpm_suspended_list.next);         get_device(dev);         if (!is_async(dev)) {             int error;              mutex_unlock(&dpm_list_mtx);              error = device_resume(dev, state, false);             if (error) {                 suspend_stats.failed_resume++;                 dpm_save_failed_step(SUSPEND_RESUME);                 dpm_save_failed_dev(dev_name(dev));                 pm_dev_err(dev, state, "", error);             }              mutex_lock(&dpm_list_mtx);         }         if (!list_empty(&dev->power.entry))             list_move_tail(&dev->power.entry, &dpm_prepared_list);         put_device(dev);     }     mutex_unlock(&dpm_list_mtx);     async_synchronize_full();     dpm_show_time(starttime, state, NULL);      cpufreq_resume();     trace_suspend_resume(TPS("dpm_resume"), state.event, false); }   |  
          
 
          
        
 
        3.8 resume_complete
 
        執行所有dpm_prepared_list上設備的complete回調函數。至此dpm_complete結束所有非系統設備的睡眠。
 
         
          
           
           |   void dpm_complete(pm_message_t state) {     struct list_head list;      trace_suspend_resume(TPS("dpm_complete"), state.event, true);     might_sleep();      INIT_LIST_HEAD(&list);     mutex_lock(&dpm_list_mtx);     while (!list_empty(&dpm_prepared_list)) {         struct device *dev = to_device(dpm_prepared_list.prev);          get_device(dev);         dev->power.is_prepared = false;         list_move(&dev->power.entry, &list);         mutex_unlock(&dpm_list_mtx);          trace_device_pm_callback_start(dev, "", state.event);         device_complete(dev, state);         trace_device_pm_callback_end(dev, 0);          mutex_lock(&dpm_list_mtx);         put_device(dev);     }     list_splice(&list, &dpm_list);     mutex_unlock(&dpm_list_mtx);     trace_suspend_resume(TPS("dpm_complete"), state.event, false); }   |  
          
 
          
        
 
         
 
        4 如何讓HiKey進入STR/STI並喚醒?
 
        可以通過配置GPIO作為喚醒源,或者通過RTC作為喚醒源,延時一定時間來喚醒。
 
        檢查是否存在/sys/class/rtc/rtc0/wakealarm,入不存在則需要打開CONFIG_RTC_DRV_PL031。
 
        寫入wakealarm的參數,表示在多少秒之后resume喚醒,退出suspend。
 
        寫mem進入state,是系統進入suspend流程。
 
         
          
           
           adb root && adb remount adb shell "echo +10 > /sys/class/rtc/rtc0/wakealarm && echo mem > /sys/power/state" |  
          
 
          
        
 
         
 
        5. suspend/resume的latency分析手段
 
        5.1 analyze_suspend.py v3.0
 
        在kernel的scripts中,這個工具可以幫助內核和OS開發者優化suspend/resume時間。
 
        在打開一系列內核選項之后,此工具就可以執行suspend操作,然后抓取dmesg和ftrace數據知道resume結束。
 
        這些數據會按照時間線顯示每個設備,並且顯示占用最多suspend/resume時間的設備或者子系統的調用關系詳圖。
 
        執行工具后,會根據時間生成一個子目錄,里面包含:html、dmesg和原始ftrace文件。
 
        下面簡單看一下工具選項:
 
         
          
           
           |   Options:   [general]     -h          Print this help text     -v          Print the current tool version     -verbose    Print extra information during execution and analysis     -status     Test to see if the system is enabled to run this tool     -modes      List available suspend modes  顯示當前支持的suspend模式     -m mode     Mode to initiate for suspend ['freeze', 'mem', 'disk'] (default: mem)  設置進入何種模式的suspend     -rtcwake t  Use rtcwake to autoresume after <t> seconds (default: disabled)  使用rtc來喚醒,參數是間隔時間   [advanced]     -f          Use ftrace to create device callgraphs (default: disabled)  基於ftrace生成調用關系圖     -filter "d1 d2 ..." Filter out all but this list of dev names     -x2         Run two suspend/resumes back to back (default: disabled)     -x2delay t  Minimum millisecond delay <t> between the two test runs (default: 0 ms)     -postres t  Time after resume completion to wait for post-resume events (default: 0 S)     -multi n d  Execute <n> consecutive tests at <d> seconds intervals. The outputs will                 be created in a new subdirectory with a summary page.   [utilities]     -fpdt       Print out the contents of the ACPI Firmware Performance Data Table     -usbtopo    Print out the current USB topology with power info     -usbauto    Enable autosuspend for all connected USB devices   [android testing]     -adb binary Use the given adb binary to run the test on an android device.  參數需要給出adb路徑,工具就會對Android設備進行測試,並將結果pull出來。有一點需要注意,在此之前確保adb具有root權限。                 The device should already be connected and with root access.                 Commands will be executed on the device using "adb shell"   [re-analyze data from previous runs] 針對之前測試數據重新分析     -ftrace ftracefile  Create HTML output using ftrace input     -dmesg dmesgfile    Create HTML output using dmesg (not needed for kernel >= 3.15)     -summary directory  Create a summary of all test in this dir   |  
          
 
          
        
 
        在了解了工具使用方法之后,就可以進行相關測試了。
 
        5.1.1 Android
 
        ./analysze_suspend.py –modes –adb /usr/bin/adb獲取當前系統支持的suspend狀態。
 
         
        1.Android上測試STR,suspend/resume共5次,每次間隔20秒。
 
         
          
           
           | ./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m mem |  
          
 
          
        
 
        2.Android上測試STI,suspend/resume共10次,每次間隔5秒。
 
         
          
           
           | ./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m freeze |  
          
 
          
        
 
        測試結果可以在如下獲得:
 
        https://github.com/arnoldlu/common-use/tree/master/tools/analyze_suspend/hikey_test
 
        存在的問題:analyze_suspend.py不支持Android的rtcwakeup和callgraph。已經在如下fix:
 
        https://github.com/arnoldlu/common-use/blob/master/tools/analyze_suspend/analyze_suspend.py
 
        5.1.1.1 總體對比
 
        下面是HiKey上測試結果,可以看出兩個數據都不夠穩定。mem的suspend和resume平均值都比較高。
 
        freeze相比mem的suspend/resume平均值提高了304.3ms/613.5ms。
 
        
 
        
 
        5.1.1.2 是否suspend CPU
 
        對比如下兩幅圖,明顯看出mem類型的suspend關閉了除CPU0之外的所有CPU;而freeze則沒有關閉任何CPU。
 
        non-boot CPUs的suspend/resume時間就達到300ms/200ms。
 
        
 
        
 
        同時從log中也可以看出mem和freeze的主要區別就在於是否disabling/enabling non-boot CPU。其他設備和子系統的suspend/resume時間基本一致。
 
        同時還可以看出mem的suspend后,系統的timestamp是停止的;而freeze的timestamp還是一直在運行的。可以得出freeze狀態持續的時間。
 
        因為先寫rtcwake為10s,然后進入睡眠,再喚醒,所以freeze時間是小於10s的。
 
         
          
           
           |   [ 3385.642962] PM: suspend entry 1970-01-01 00:57:30.580909763 UTC [ 3385.649165] PM: Syncing filesystems ... done. [ 3385.661349] Freezing user space processes ...  [ 3385.671207] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable [ 3385.678933] dwc2 f72c0000.usb: GINNakEff triggered [ 3385.685718] (elapsed 0.019 seconds) done. [ 3385.689860] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 3385.700092] Suspending console(s) (use no_console_suspend to debug) [ 3385.736020] PM: suspend of devices complete after 27.195 msecs [ 3385.740811] PM: late suspend of devices complete after 4.765 msecs [ 3385.743919] PM: noirq suspend of devices complete after 3.090 msecs Disabling and Enabling non-boot CPUs [ 3386.209126] PM: noirq resume of devices complete after 1.865 msecs [ 3386.212066] PM: early resume of devices complete after 2.460 msecs [ 3386.234729] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3386.311480] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0) [ 3386.410411] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3386.458232] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0) [ 3386.458729] PM: resume of devices complete after 246.646 msecs [ 3386.818770] Restarting tasks ...  [ 3386.827026] done. [ 3386.844139] PM: suspend exit 1970-01-01 00:57:40.624589167 UTC   |  
              [ 3471.760265] PM: Syncing filesystems ... done. [ 3471.771897] Freezing user space processes ...  [ 3471.780407] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable [ 3471.788105] dwc2 f72c0000.usb: GINNakEff triggered [ 3471.794916] (elapsed 0.018 seconds) done. [ 3471.799078] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 3471.809320] Suspending console(s) (use no_console_suspend to debug) [ 3471.847947] PM: suspend of devices complete after 29.905 msecs [ 3471.852473] PM: late suspend of devices complete after 4.497 msecs [ 3471.855611] PM: noirq suspend of devices complete after 3.120 msecs
  [ 3481.034722] PM: noirq resume of devices complete after 1.945 msecs [ 3481.037992] PM: early resume of devices complete after 2.694 msecs [ 3481.062803] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3481.137795] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0) [ 3481.234796] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3481.278601] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0) [ 3481.279396] PM: resume of devices complete after 241.388 msecs [ 3481.358513] Restarting tasks ... done. [ 3481.377766] PM: suspend exit 1970-01-01 00:59:15.332218333 UTC   |  
          
 
          
        
 
         
 
        5.1.1.3 resume_console節省時間
 
        對比resume_console可以發現,mem要比freeze多210ms。
 
        5.1.2 Ubuntu
 
        此工具在Ubuntu上顯示了更強大的功能。
 
        支持了callgraph功能之后,更能清晰地分析每個設備或者子系統的suspend/resume占用的時間。
 
         
          
           
           sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m mem sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m freeze |  
          
 
          
        
 
        在對比兩種不同suspend模式后,發現freeze花費的時間要比mem少。這也符合預期,但是沒有功耗數據?_?。
 
        
 
        
 
        下面着重分析一下如何基於此工具分析。
 
        5.1.3 工具界面總體分析
 
        最上面顯示Kernel Suspend Time和Kernel Resume Time,可以從總體上查看是否有回退或者進步。
 
        再下面是一些縮放按鈕。
 
        然后就是基於timeline的圖表,比對顏色示意圖,可以清晰看出suspend prepare、suspend、suspend late、suspend irq、suspend machine、resume machine、resume irq、resume early、resume和resume complete的分布。
 
        最下面是每個模塊、子系統的詳細函數調用圖以及開始時間、消耗時間。
 
        
 
        5.1.4 子系統、模塊詳細分析
 
        選中一個模塊,會在最下面顯示詳細的模塊在suspend/resume各個階段消費的時間,以及函數調用關系圖。
 
        
 
        5.1.5 縮放查看細節
 
        ZOOM IN放大,ZOOMOUT縮小,ZOOM 1:1恢復原始尺寸。
 
        通過在timeline圖表,放大可以查看到更小的模塊消耗的時間。從宏觀到模塊,再到函數消耗時間,逐步細化,很有利於分析。
 
        
 
        如果發現某個函數占用時間較大,可以逐級展開。知道發現最終占用較大的函數,發現問題所在。
 
        
 
        5.1.6 工具代碼分析
 
        首先從入口main開始,和大多數工具一樣開始都是解析命令選項,打印help信息;將所有的待測對象參數和測試參數保存在sysvals;
 
         
          
           
           |   # ----------------- MAIN -------------------- # exec start (skipped if script is loaded as library) if __name__ == '__main__':     cmd = ''     cmdarg = ''     multitest = {'run': False, 'count': 0, 'delay': 0}     # loop through the command line arguments     args = iter(sys.argv[1:])     for arg in args:         …
      # just run a utility command and exit     if(cmd != ''):         if(cmd == 'status'):             statusCheck()         elif(cmd == 'fpdt'):             if(sysvals.android):                 doError('cannot read FPDT on android device', False)             getFPDT(True)         elif(cmd == 'usbtopo'):             if(sysvals.android):                 doError('cannot read USB topology '+\                     'on an android device', False)             detectUSB(True)         elif(cmd == 'modes'):             modes = getModes()             print modes         elif(cmd == 'usbauto'):             setUSBDevicesAuto()         elif(cmd == 'summary'):             print("Generating a summary of folder \"%s\"" % cmdarg)             runSummary(cmdarg, True)         sys.exit()      # run test on android device     if(sysvals.android):  注釋此段代碼可以在Android上支持callgraph         #if(sysvals.usecallgraph):          #    doError('ftrace (-f) is not yet supported '+\         #        'in the android kernel', False)         if(sysvals.notestrun):             doError('cannot analyze test files on the '+\                 'android device', False)      # if instructed, re-analyze existing data files     if(sysvals.notestrun):  分析已有數據文件,不需要重新測試         rerunTest()         sys.exit()      # verify that we can run a test     if(not statusCheck()):  檢查測試條件是否滿足         print('Check FAILED, aborting the test run!')         sys.exit()      if multitest['run']:  連續多次測試         # run multiple tests in a separte subdirectory         s = 'x%d' % multitest['count']         subdir = datetime.now().strftime('suspend-'+s+'-%m%d%y-%H%M%S')         os.mkdir(subdir)         for i in range(multitest['count']):             if(i != 0):                 print('Waiting %d seconds...' % (multitest['delay']))                 time.sleep(multitest['delay'])             print('TEST (%d/%d) START' % (i+1, multitest['count']))             runTest(subdir)  進行單次測試             print('TEST (%d/%d) COMPLETE' % (i+1, multitest['count']))         runSummary(subdir, False)  生成summary.html     else:         # run the test in the current directory         runTest(".")   |  
          
 
          
        
 
        sysvals.android表示是否在Android設備進行測試。
 
        sysvals.usecallgraph表示是否生成函數調用關系圖。
 
        sysvals.rtcwake表示是否使用rtc進行喚醒。
 
        針對Ubuntu之類的host設備,測試進行的很順利。但是針對Android設備,在callgraph還存在一點問題。
 
        run_Test無疑作為核心,收集log信息(ftrace、dmesg),執行suspend/resume,生成輸出文件(txt、html)。
 
         
          
           
           |   def runTest(subdir):     global sysvals      # prepare for the test     if(not sysvals.android):  針對不同的待測設備,初始化ftrace         initFtrace()     else:         initFtraceAndroid()     sysvals.initTestOutput(subdir)  生成輸出目錄,輸出文件名等。      vprint('Output files:\n    %s' % sysvals.dmesgfile)     if(sysvals.usecallgraph or         sysvals.usetraceevents or         sysvals.usetraceeventsonly):         vprint('    %s' % sysvals.ftracefile)     vprint('    %s' % sysvals.htmlfile)      # execute the test  執行測試,實際上命令內容基本一致。只是針對Android設備,增加了adb shell '…'。     if(not sysvals.android):         executeSuspend()     else:         executeAndroidSuspend()      # analyze the data and create the html output     print('PROCESSING DATA')     if(sysvals.usetraceeventsonly):  3.15之后的版本,只需要通過ftrace即可獲取足夠信息。之前的版本的數據都存在dmesg中。         # data for kernels 3.15 or newer is entirely in ftrace         testruns = parseTraceLog()     else:         # data for kernels older than 3.15 is primarily in dmesg         testruns = loadKernelLog()         for data in testruns:             parseKernelLog(data)         if(sysvals.usecallgraph or sysvals.usetraceevents):             appendIncompleteTraceLog(testruns)     createHTML(testruns)  根據解析的數據生成html矢量圖表   |  
          
 
          
        
 
        executeAndroidSuspend在Android設備上操作sysfs節點來配置ftrace,抓取log,suspend/resume,然后將log拉到主機。
 
         
          
           
           |   def executeAndroidSuspend():     global sysvals      # check to see if the display is currently off     tp = sysvals.tpath     out = os.popen(sysvals.adb+\         ' shell dumpsys power | grep mScreenOn').read().strip()     # if so we need to turn it on so we can issue a new suspend     if(out.endswith('false')):         print('Waking the device up for the test...')         # send the KEYPAD_POWER keyevent to wake it up         os.system(sysvals.adb+' shell input keyevent 26')         # wait a few seconds so the user can see the device wake up         time.sleep(3)     # execute however many s/r runs requested     for count in range(1,sysvals.execcount+1):         # clear the kernel ring buffer just as we start         os.system(sysvals.adb+' shell dmesg -c > /dev/null 2>&1')  清空dmesg         # start ftrace         if(sysvals.usetraceevents):             print('START TRACING')             os.system(sysvals.adb+" shell 'echo 1 > "+tp+"tracing_on'")  開始ftrace抓取         # initiate suspend         for count in range(1,sysvals.execcount+1):             if(sysvals.usetraceevents):                 os.system(sysvals.adb+\                     " shell 'echo SUSPEND START > "+tp+"trace_marker'")  寫SUSPEND START到ftrace,作為開始標記。后面解析log,會以此為標記。             if(sysvals.rtcwake):                 print('SUSPEND START')                 print('will autoresume in %d seconds' % sysvals.rtcwaketime)                 os.system(sysvals.adb+" shell 'echo +%d > /sys/class/rtc/rtc0/wakealarm'"%(sysvals.rtcwaketime))  設置wakeup resource             else:                 print('SUSPEND START (press a key to resume)')              os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\                 " > "+sysvals.powerfile+"'")  進入suspend,之后就是resume             # execution will pause here, then adb will exit             while(True):  輪詢adb shell pwd判斷設備是否被喚醒                 check = os.popen(sysvals.adb+\                     ' shell pwd 2>/dev/null').read().strip()                 if(len(check) > 0):                     break                 time.sleep(1)             if(sysvals.usetraceevents):                 os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\                     "trace_marker'")  寫RESUME COMPLETE到ftrace,作為結束標記         # return from suspend         print('RESUME COMPLETE')         # stop ftrace         if(sysvals.usetraceevents):             os.system(sysvals.adb+" shell 'echo 0 > "+tp+"tracing_on'") 關閉ftrace功能             print('CAPTURING TRACE')             os.system('echo "'+sysvals.teststamp+'" > '+sysvals.ftracefile)             os.system(sysvals.adb+' shell cat '+tp+\                 'trace >> '+sysvals.ftracefile)  將/sys/kernel/debug/tracing/trace內容保存到本地log         # grab a copy of the dmesg output         print('CAPTURING DMESG')         os.system('echo "'+sysvals.teststamp+'" > '+sysvals.dmesgfile)         os.system(sysvals.adb+' shell dmesg >> '+sysvals.dmesgfile)  將dmesg保存到本地   |  
          
 
          
        
 
        parseTraceLog用於解析ftrace log,phase的判斷是依據suspend_resume關鍵詞;每個模塊的開始結束是以device_pm_callback_start/device_pm_callback_end作為判斷;還調用FTraceCallGraph進行函數調用關系的解析。
 
        createHTML是這個工具真正NB的地方,對parseTraceLog結果進行了可視化,生成可縮放、查看細節的html文件。
 
         
 
        6 對工具的改進
 
        雖然工具非常強大,但是在使用中還是有一些視角沒有覆蓋到。所以做了一些改進。
 
        在Android上使能rtcwake;在Android上使能callgraph;針對多次測試生成csv比較不同phase消耗時間,比summary.html更細化;這對每次測試給出Phase時間和每個Phase內Device消耗時間。
 
        6.1 Android上使能rtcwake
 
        https://github.com/arnoldlu/common-use/commit/a862d8c2a4f9bd005c516c6b61b394386b882217
 
        可以在Android上使用rtc作為喚醒源,可以在沒有實體按鍵的設備上進行測試。
 
        6.2 Android上使能callgraph
 
        https://github.com/arnoldlu/common-use/commit/f8e288753a472cf48ccc0e9d7ffc67978c7d165e
 
        如果沒有callgraph只能顯示Phase級別的信息,不能顯示每個device的信息以及內部函數耗費的時間。
 
        6.3 單次測試summary結果
 
        https://github.com/arnoldlu/common-use/commit/53c270669bb0dfaada53e29852999d5367ec65da
 
        在每次測試目錄下,生成一個summary_phase_dev.csv文件。可以直觀的看到不同Phase、不同device消耗的時間。
 
        
 
        如果想要發現那個模塊消耗最大時間,可以使用Excel的Filter功能。比如想看suspend_prepare下Device消耗時間有大到小排列。
 
        這樣就可以找出每個Phase中消耗資源大戶。
 
        
 
        6.4 多次測試summary結果
 
        https://github.com/arnoldlu/common-use/commit/d162c4827a0cdc50fe94d3f1303af682b387dc3d
 
        生成summary_phase.csv文件,按每次測試的不同phase顯示耗費時間。
 
        可以比較不同測試phase的時間耗費,看出哪一個phase存在回退現象。
 
        
 
        6.5 suspend/resume起止時間點判斷
 
        analyze_suspend.py在解析log的時候,以SUSPEND START作為起點,以RESUME COMPLETE為終點。
 
        在發送SUSPEND START之后,觸發suspend動作。在這期間,如果host存在一定搶占,會增加suspend時間。
 
        然后通poll設備的adb狀態,來判斷是否resume。一方面,adb可用狀態要在resume結束之后,另一方面,在最壞的情況下,可能存在1s的誤差,這對於毫秒級的resume來說是非常嚴重的一個結果。
 
        最后發送RESUME COMPLETE作為結束。
 
         
          
           
           |   if(sysvals.usetraceevents):     os.system(sysvals.adb+\         " shell 'echo SUSPEND START > "+tp+"trace_marker'") print('SUSPEND START (press a key on the device to resume)') os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\     " > "+sysvals.powerfile+"'") # execution will pause here, then adb will exit while(True):     check = os.popen(sysvals.adb+\         ' shell pwd 2>/dev/null').read().strip()     if(len(check) > 0):         break     time.sleep(1) if(sysvals.usetraceevents):     os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\         "trace_marker'")   |  
          
 
          
        
 
        更好的方式是在enter_state的開頭結尾加ftrace,然后解析的時候以此為標記。
 
         
          
           
           |   @@ -486,6 +496,7 @@ static int enter_state(suspend_state_t state) {         int error;   +       trace_suspend_resume(TPS("enter_state"), state, true);         trace_suspend_resume(TPS("suspend_enter"), state, true);         if (state == PM_SUSPEND_FREEZE) { #ifdef CONFIG_PM_DEBUG @@ -532,6 +543,7 @@ static int enter_state(suspend_state_t state)         suspend_finish();   Unlock:         mutex_unlock(&pm_mutex); +       trace_suspend_resume(TPS("enter_state"), state, false);         return error; }   |  
          
 
          
        
 
        7 分析步驟
 
        本着從宏觀到微觀的進階,一步步分找出可以優化的點。
 
        下面是從開始一次測試到每次測試到suspend/resume不同phase,再到每個phase里面device callback的關系。
 
        
 
        下面是每一次正常suspend/resume的流程,之前每個階段函數分析也可以看出他們的對稱關系。
 
        
 
        在修改了工具對於suspend和resume時間判斷的bug過后,得到了一組的數據。
 
        
 
        分析一下穩定性,均方差比較小,還算比較穩定。數據穩定之后,就可以進行詳細分析了。
 
        
 
        下面查看每次測試的每個phase數據,可以看出每個phase數據的穩定性,以及每個phase費時占比。找出費時大戶,suspend_prepare、suspend、suspend_machine、resume_machine、resume、resume_complete。
 
        
 
        針對上述六個phase,列出Top 10設備或者子系統。
 
        從下圖可以看出,freeze_processes、sync_filesystems、mmc0、mmc2、CUP0~7、resume_console、tsensor是需要重點分析的設備。
 
        

 
        

 
        

 
        不區分phase列出Top 30如下,下面逐一分析可優化的空間。
 
        
 
        7.1 resume_console
 
         
          
           
           |   adb shell 'echo N > /sys/module/printk/parameters/console_suspend' adb shell 'cat /sys/module/printk/parameters/console_suspend'   |  
          
 
          
        
 
        先看一下resume_console流程函數:
 
         
          
           
           |   void resume_console(void) {     if (!console_suspend_enabled)         return;     down_console_sem();  獲取console_sem和console_lock_dep_map     console_suspended = 0;     console_unlock();   }   |  
          
 
          
        
 
        通過分析ftrace發現,主要時間消耗在console_unlock中。因為在console_lock被占用期間,有相當一部分由printk緩存的log。所以在釋放鎖之前需要將其處理掉。
 
         
          
           
           |   void console_unlock(void) {     static char ext_text[CONSOLE_EXT_LOG_MAX];     static char text[LOG_LINE_MAX + PREFIX_MAX];     static u64 seen_seq;     unsigned long flags;     bool wake_klogd = false;     bool do_cond_resched, retry;      trace_console_lock("console_unlock start", strlen("console_unlock start"));\     if (console_suspended) {         up_console_sem();         return;     }      /*      * Console drivers are called under logbuf_lock, so      * @console_may_schedule should be cleared before; however, we may      * end up dumping a lot of lines, for example, if called from      * console registration path, and should invoke cond_resched()      * between lines if allowable.  Not doing so can cause a very long      * scheduling stall on a slow console leading to RCU stall and      * softlockup warnings which exacerbate the issue with more      * messages practically incapacitating the system.      */     do_cond_resched = console_may_schedule;     console_may_schedule = 0;      /* flush buffered message fragment immediately to console */     console_cont_flush(text, sizeof(text)); again:     for (;;) {  如果默認的LOGLEVEL定的比較高,即優先級低,則會有相當多的log需要打印。占用很多時間。         …     }     console_locked = 0;      /* Release the exclusive_console once it is used */     if (unlikely(exclusive_console))         exclusive_console = NULL;      raw_spin_unlock(&logbuf_lock);      up_console_sem();  釋放console_sem和console_lock_dep_map
       /*      * Someone could have filled up the buffer again, so re-check if there's      * something to flush. In case we cannot trylock the console_sem again,      * there's a new owner and the console_unlock() from them will do the      * flush, no worries.      */     raw_spin_lock(&logbuf_lock);     retry = console_seq != log_next_seq;     raw_spin_unlock_irqrestore(&logbuf_lock, flags);      if (retry && console_trylock())         goto again;      if (wake_klogd)         wake_up_klogd();     trace_console_lock("console_unlock end", strlen("console_unlock end"));\ }   |  
          
 
          
        
 
        那么問題就變得簡單了,減少printk量就可以了。
 
        通過cat /proc/sys/kernel/printk可以得到。在kernel/sysctl.c中有其實現。
 
         
        這四個值分別對應:
 
         
          
           
           |   #define console_loglevel (console_printk[0]) #define default_message_loglevel (console_printk[1]) #define minimum_console_loglevel (console_printk[2]) #define default_console_loglevel (console_printk[3])   |  
          
 
          
        
 
        又對應到:
 
         
          
           
           |   int console_printk[4] = {     CONSOLE_LOGLEVEL_DEFAULT,    /* console_loglevel */     MESSAGE_LOGLEVEL_DEFAULT,    /* default_message_loglevel */     CONSOLE_LOGLEVEL_MIN,        /* minimum_console_loglevel */     CONSOLE_LOGLEVEL_DEFAULT,    /* default_console_loglevel */ };  /* We show everything that is MORE important than this.. */ #define CONSOLE_LOGLEVEL_SILENT  0 /* Mum's the word */ #define CONSOLE_LOGLEVEL_MIN     1 /* Minimum loglevel we let people use */ #define CONSOLE_LOGLEVEL_QUIET     4 /* Shhh ..., when booted with "quiet" */ #define CONSOLE_LOGLEVEL_DEFAULT 7 /* anything MORE serious than KERN_DEBUG */ #define CONSOLE_LOGLEVEL_DEBUG    10 /* issue debug messages */ #define CONSOLE_LOGLEVEL_MOTORMOUTH 15    /* You can't shut this one up */   |  
          
 
          
        
 
        可知只要內核log優先級高於KERN_DEBUG都會被打印。由下表可知幾乎所有的log都會被打印。這就會造成printk相當繁忙,console_unlock會處理相當多信息。
 
         
          
           
           |   #define KERN_EMERG    KERN_SOH "0"    /* system is unusable */ #define KERN_ALERT    KERN_SOH "1"    /* action must be taken immediately */ #define KERN_CRIT    KERN_SOH "2"    /* critical conditions */ #define KERN_ERR    KERN_SOH "3"    /* error conditions */ #define KERN_WARNING    KERN_SOH "4"    /* warning conditions */ #define KERN_NOTICE    KERN_SOH "5"    /* normal but significant condition */ #define KERN_INFO    KERN_SOH "6"    /* informational */ #define KERN_DEBUG    KERN_SOH "7"    /* debug-level messages */  #define KERN_DEFAULT    KERN_SOH "d"    /* the default kernel loglevel */   |  
          
 
          
        
 
        想解決也很簡單,提高console_loglevel的優先級。
 
         
          
           
           |   diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c old mode 100644 new mode 100755 index e7e586b..b927d67 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -60,7 +60,7 @@ extern void printascii(char *); #endif   int console_printk[4] = { -       CONSOLE_LOGLEVEL_DEFAULT,       /* console_loglevel */ +       CONSOLE_LOGLEVEL_QUIET, /* console_loglevel */         MESSAGE_LOGLEVEL_DEFAULT,       /* default_message_loglevel */         CONSOLE_LOGLEVEL_MIN,           /* minimum_console_loglevel */         CONSOLE_LOGLEVEL_DEFAULT,       /* default_console_loglevel */   |  
          
 
          
        
 
        在進行修改后,再來進行對比測試。可以看出消耗時間得到顯著提升,優化后的resume_complete時間基本上可以忽略不計。
 
         
          
           
           |   7, mem     Line 748: resume_complete,resume_console[3],248.54900000002544     Line 748: resume_complete,resume_console[3],248.6340000000382     Line 748: resume_complete,resume_console[3],248.26499999994667     Line 748: resume_complete,resume_console[3],248.3510000000706     Line 748: resume_complete,resume_console[3],248.42499999999745  7, freeze     Line 996: resume_complete,resume_console[1],76.18400000001202     Line 996: resume_complete,resume_console[1],76.19500000009793     Line 996: resume_complete,resume_console[1],76.3280000001032     Line 996: resume_complete,resume_console[1],76.1689999999362     Line 996: resume_complete,resume_console[1],76.19999999997162     4, freeze     Line 996: resume_complete,resume_console[1],0.1010000000007949     Line 996: resume_complete,resume_console[1],0.10499999999069587     Line 996: resume_complete,resume_console[1],0.09799999997994746     Line 996: resume_complete,resume_console[1],0.1010000000007949     Line 996: resume_complete,resume_console[1],0.10000000003174137  4, mem     Line 749: resume_complete,resume_console[3],0.3370000000586515     Line 749: resume_complete,resume_console[3],0.33800000005612674     Line 749: resume_complete,resume_console[3],0.37700000007134804     Line 749: resume_complete,resume_console[3],0.3359999999474894     Line 749: resume_complete,resume_console[3],0.3429999999298161   |  
          
 
          
        
 
        7.2 mmc suspend/resuem分析
 
        從下圖可知,mmc相關suspend/resume主要在mmc0:0001和mmc2:0001兩個設備的suspend/resume。下面重點分析這兩個設備的suspend/resume回調函數。
 
        
 
        在執行suspend過程中,先將bus上面的設備driver先suspend,然后在suspend bus。
 
        在resume時,過程相反,先bus resume,然后再逐個設備driver resume。
 
         
 
        mmc0:0001
 
        那就來看看bus和各個設備耗費的時間:
 
         
          
           
           |   4013.868837 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [suspend] */ 4013.868893 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */ 4013.869000 |   4)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [suspend] */ 4013.869053 |   4)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */ 4013.889229 |   5)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [suspend] */ 4013.914631 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */ 4022.787571 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [resume] */ 4022.886749 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */ 4023.059198 |   0)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [resume] */ 4023.059270 |   0)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */ 4023.059398 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [resume] */ 4023.059830 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */   |  
          
 
          
        
 
        可以看出driver的suspend/resume並沒有耗費太多時間,主要在mmc bus的suspend/resume耗費了太多時間。
 
        在drivers/mmc/core/bus.c中
 
         
          
           
           |   static struct bus_type mmc_bus_type = {     .name        = "mmc",     .dev_groups    = mmc_dev_groups,     .match        = mmc_bus_match,     .uevent        = mmc_bus_uevent,     .probe        = mmc_bus_probe,     .remove        = mmc_bus_remove,     .shutdown    = mmc_bus_shutdown,     .pm        = &mmc_bus_pm_ops, };   |  
          
 
          
        
 
        mmc_bus_pm_bus對應的suspend/resume函數如下:
 
         
          
           
           |   static const struct dev_pm_ops mmc_bus_pm_ops = {     SET_RUNTIME_PM_OPS(mmc_runtime_suspend, mmc_runtime_resume, NULL)     SET_SYSTEM_SLEEP_PM_OPS(mmc_bus_suspend, mmc_bus_resume) };   |  
          
 
          
        
 
        mmc bus的suspend/resume如下:
 
         
          
           
           |   static int mmc_bus_suspend(struct device *dev) {     struct mmc_card *card = mmc_dev_to_card(dev);     struct mmc_host *host = card->host;     int ret;      ret = pm_generic_suspend(dev);  對應設備驅動的suspend回調函數。     if (ret)         return ret;      ret = host->bus_ops->suspend(host);   這里的host指的是mmc_host,bus_ops指的是mmc_ops。     return ret; }  static int mmc_bus_resume(struct device *dev) {     struct mmc_card *card = mmc_dev_to_card(dev);     struct mmc_host *host = card->host;     int ret;      ret = host->bus_ops->resume(host);  這里的host指的是mmc_host,bus_ops指的是mmc_ops。       if (ret)         pr_warn("%s: error %d during resume (card was removed?)\n",             mmc_hostname(host), ret);      ret = pm_generic_resume(dev);  對應設備的驅動的resume回調函數。     return ret; }   |  
          
 
          
        
 
        pm_generic_suspend和pm_generic_resume是對子系統設備的通用回調函數。
 
         
          
           
           |   int pm_generic_suspend(struct device *dev) {     const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;      return pm && pm->suspend ? pm->suspend(dev) : 0; }  int pm_generic_resume(struct device *dev) {     const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;      return pm && pm->resume ? pm->resume(dev) : 0; }   |  
          
 
          
        
 
        從中可以看出,如果對應設備存在dev->driver->pm->suspend和dev->driver->pm->resume則,調用回調函數。
 
        mmc_bus_suspend
 
        mmc_bus_suspend花費了大概25.4ms。下面是ftrace中截取的一段,從中可以看出時間主要消耗在struct mmc_bus_ops mmc_ops的.suspend回調函數。
 
         
          
           
           /*            */   mmc_bus_suspend() { /*            */     pm_generic_suspend() { /*! 307.552 us*/     } /*            */     mmc_suspend() { /** 25060.78 us*/      } /* mmc_suspend */ /** 25378.28 us*/    } /* mmc_bus_suspend */ |  
          
 
          
        
 
        mmc_bus_resume
 
        mmc_resume消耗了大部分時間,整個流程才99.158ms。
 
         
          
           
           |   /*              */   mmc_bus_resume() { /*              */     mmc_resume() { /* * 97167.39 us*/     } /*              */     pm_generic_resume() { /* # 1980.104 us*/     } /* * 99158.12 us*/   }   |  
          
 
          
        
 
        mmc_suspend
 
        對應的host->bus_ops,即mmc_ops。在host下的設備都suspend之后,suspend mmc_host。
 
        在mmc_host resume之后,才能進行設備的resume。
 
         
          
           
           |   static const struct mmc_bus_ops mmc_ops = {     .remove = mmc_remove,     .detect = mmc_detect,     .suspend = mmc_suspend,     .resume = mmc_resume,     .runtime_suspend = mmc_runtime_suspend,     .runtime_resume = mmc_runtime_resume,     .alive = mmc_alive,     .shutdown = mmc_shutdown,     .reset = mmc_reset, };   |  
          
 
          
        
 
        通過分析ftrace.txt文件,發現其中msleep花費了17.1ms,這里是存在問題的。
 
         
          
           
           |   int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,         unsigned int timeout_ms, bool use_busy_signal, bool send_status,         bool ignore_crc) {     struct mmc_host *host = card->host;     int err;     struct mmc_command cmd = {0};     unsigned long timeout;     u32 status = 0;     bool use_r1b_resp = use_busy_signal;      mmc_retune_hold(host);  …         /*          * We are not allowed to issue a status command and the host          * does'nt support MMC_CAP_WAIT_WHILE_BUSY, then we can only          * rely on waiting for the stated timeout to be sufficient.          */         if (!send_status) {             mmc_delay(timeout_ms);             goto out;         }  … }   |  
          
 
          
        
 
        mmc_resume
 
        通過分析ftrece.txt,可以發現mmc_resume存在4個msleep,共消耗了12646.35 +14260.78 +13881.66 +15093.22 =55.882 ms。
 
        關於mmc_ops的suspend/resume/runtime_suspend/runtime_resume的探討
 
        先來看看這四個函數的,其流程受到MMC_CAP_AGGRESSIVE_PM和MMC_CAP_RUNTIME_RESUME兩個flag的控制。執行的實體都是_mmc_suspend、_mmc_resume。
 
         
          
           
           |   static int mmc_suspend(struct mmc_host *host) {     int err;      err = _mmc_suspend(host, true);     if (!err) {         pm_runtime_disable(&host->card->dev);         pm_runtime_set_suspended(&host->card->dev);     }      return err; }  static int mmc_resume(struct mmc_host *host) {     int err = 0;      if (!(host->caps & MMC_CAP_RUNTIME_RESUME)) {         err = _mmc_resume(host);         pm_runtime_set_active(&host->card->dev);         pm_runtime_mark_last_busy(&host->card->dev);     }     pm_runtime_enable(&host->card->dev);      return err; }    static int mmc_runtime_suspend(struct mmc_host *host) {     int err;      if (!(host->caps & MMC_CAP_AGGRESSIVE_PM))         return 0;      err = _mmc_suspend(host, true);     if (err)         pr_err("%s: error %d doing aggressive suspend\n",             mmc_hostname(host), err);      return err; }    static int mmc_runtime_resume(struct mmc_host *host) {     int err;      if (!(host->caps & (MMC_CAP_AGGRESSIVE_PM | MMC_CAP_RUNTIME_RESUME)))         return 0;      err = _mmc_resume(host);     if (err)         pr_err("%s: error %d doing aggressive resume\n",             mmc_hostname(host), err);      return 0; }   |  
          
 
          
        
 
        1.如果兩flag都沒有定義,則runtime_suspend和runtim_resume都是空函數。起作用的就是跟隨系統的suspend/resume流程。
 
        2.如果只定義了MMC_CAP_RUNTIME_RESUME,則不會runtime_suspend。並且在系統resume的時候,不會執行resume回調函數。只會在根據需要執行runtime_resume。使用runtime_resume代替了resume。
 
        3.如果只定義了MMC_CAP_AGGRESSIVE_PM ,則suspend/resume跟隨系統suspend/resume流程。並且runtime_suspend/resume_resume也根據實際情況執行。一切正常。
 
        4.如果兩者都定義了,既可以suspend也可以runtime_suspend,但是只能runtime_resume,不能跟隨系統resume流程執行resume回調函數。
 
        也就是說MMC_CAP_AGGRESSIVE_PM 則runtime_suspend/runtime_resume都可用,MMC_CAP_RUNTIME_RESUME則只能使用runtime_resume執行resume功能。
 
         
 
        那么就來看一下,在應用了MMC_CAP_RUNTIME_RESUME之后效果如何。
 
        mmc0:0001增加runtime-suspend屬性:
 
         
          
           
           |   diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi old mode 100644 new mode 100755 index 09e2c71..2cec392 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -801,6 +801,7 @@                         clock-names = "ciu", "biu";                         resets = <&sys_ctrl PERIPH_RSTDIS0_MMC0>;                         bus-width = <0x8>; +                      runtime-suspend;                         vmmc-supply = <&ldo19>;                         pinctrl-names = "default";                         pinctrl-0 = <&emmc_pmx_func &emmc_clk_cfg_func   |  
          
 
          
        
 
        修改DeviceTree解析文件,增加MMC_CAP_RUNTIME_RESUME。
 
         
          
           
           |   index 094202c..35fd7b5 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2922,6 +2922,10 @@ static struct dw_mci_board *dw_mci_parse_dt(struct dw_mci *host)                 dev_info(dev, "supports-highspeed property is deprecated.\n");                 pdata->caps |= MMC_CAP_SD_HIGHSPEED | MMC_CAP_MMC_HIGHSPEED;         } +       if (of_find_property(np, "runtime-suspend", NULL)) { +               dev_info(dev, "supports-highspeed property is deprecated.\n"); +               pdata->caps |=  MMC_CAP_RUNTIME_RESUME; +       }           return pdata; }   |  
          
 
          
        
 
        修改后mmc0:0001的resume達到了預期,mmc_resume沒有被執行。
 
        針對統計結果,效果明顯。
 
        
 
        雖然沒有在系統resume過程中執行,但是mmc0:0001總要resume。只不過稍微延遲了,不再這個工具統計之中。
 
        延后執行的mmc0:0001的resume耗費了72.317ms,也和之前的差不多。實際上沒有對整個流程作出實質貢獻,只是不在統計數據之內。
 
         
          
           
           |   [32m[   32.486851] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31 caps=40138143 caps2=0) [32m[   32.500871] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 1000 1000 664 [32m[   32.501305] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq 1000 1000 664 [32m[   32.540313] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq 1000 1000 664 [32m[   32.540747] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq 1000 1000 664 [32m[   32.559168] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0 caps=40138143 caps2=0)   |  
          
 
          
        
 
        mmc2:0001
 
        mmc2:0001和mmc0:0001的區別在於不同的mmc_bus_ops,mmc2:0001是SDIO接口,對應的應該是mmc_sdio_ops。
 
         
          
           
           |   4013.876306 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [suspend] */ 4013.876360 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */ 4013.876397 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [suspend] */ 4013.876437 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */ 4013.876470 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [suspend] */ 4013.876525 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */ 4013.876556 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [suspend] */ 4013.876596 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */ 4013.881676 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [suspend] */ 4013.881698 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */ 4013.881740 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [suspend] */ 4013.881765 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */ 4013.882582 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [suspend] */ 4013.882603 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */ 4013.882645 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [suspend] */ 4013.885524 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */ 4022.888667 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [resume] */ 4023.042980 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */ 4023.043021 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [resume] */ 4023.043037 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */ 4023.043067 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [resume] */ 4023.043089 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */ 4023.043128 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [resume] */ 4023.043151 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */ 4023.048824 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [resume] */ 4023.048877 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */ 4023.048916 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [resume] */ 4023.048979 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */ 4023.049011 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [resume] */ 4023.049074 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */ 4023.049113 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [resume] */ 4023.049165 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */   |  
          
 
          
        
 
        由下可知不同部分在於mmc_host的suspend/resume,pm_generic_suspend/pm_generic_resume基本上耗費的時間都很少。
 
        所以重點看看mmc_sdio_suspend和mmc_sdio_resume兩個函數。
 
         
          
           
           |   /*              */    mmc_bus_suspend() { /*   0.833 us   */      pm_generic_suspend(); /*              */      mmc_sdio_suspend() { /* # 2854.687 us*/      } /* # 2864.115 us*/    }   |  
          
 
          
        
 
         
 
         
          
           
           |   /*              */    mmc_bus_resume() { /*              */      mmc_sdio_resume() { /* @ 154277.3 us*/      } /*   1.563 us   */      pm_generic_resume(); /* @ 154290.3 us*/    }   |  
          
 
          
        
 
        mmc_sdio_suspend
 
        時間很短,不關注。
 
        mmc_sdio_resume
 
        mmc2:001的mmc_bus_resume時間達到154.313,mmc_sdio_resume包含三個msleep共75331.82+15953.43+14369.58=105654.83us=105.654ms。
 
        考慮:是否可以將SDIO的resume也像MMC那樣延后執行呢?
 
        7.3 CPU_OFF/CPU_ON
 
        在分析了resome_console和mmc之后,再來看一下CPU_OFF/CPU_ON的執行過程。
 
        在disable_nonboot_cpus中選取first_cpu,除此之外的所有for_each_online_cpu都會被_cpu_down,並且將其放到frozen_cpus上。
 
        在enable_nonboot_cpus中,遍歷frozen_cpus,將其_cpu_up。
 
        針對HiKey,真個流程就是對CPU 1-7進行關閉、打開的操作,所以重點分析一下_cpu_down和_cpu_up。
 
        耗時最大的三個地方都用粗體下划線標出,除了發送狀態通知之外,還有rcu sync處理。
 
        對cpu_chain上所有注冊notifier,逐個執行回調函數notifier_call,根據action進行處理,這是一個很耗時的過程。
 
         
          
           
           |   /* Requires cpu_add_remove_lock to be held */ static int _cpu_down(unsigned int cpu, int tasks_frozen) {     int err, nr_calls = 0;     void *hcpu = (void *)(long)cpu;     unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;     struct take_cpu_down_param tcd_param = {         .mod = mod,         .hcpu = hcpu,     };      if (num_online_cpus() == 1)  如果online只有一個CPU,則無法再進行down操作。         return -EBUSY;      if (!cpu_online(cpu))  如果當前CPU沒有online,則無需進行down。         return -EINVAL;      cpu_hotplug_begin();  取得cpu_hotplug.lock鎖      err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);  在cpu_chain上發從CPU_DOWN_PREPARE狀態。     if (err) {         nr_calls--;         __cpu_notify(CPU_DOWN_FAILED | mod, hcpu, nr_calls, NULL);         pr_warn("%s: attempt to take down CPU %u failed\n",             __func__, cpu);         goto out_release;     }      /*      * By now we've cleared cpu_active_mask, wait for all preempt-disabled      * and RCU users of this state to go away such that all new such users      * will observe it.      *      * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might      * not imply sync_sched(), so wait for both.      *      * Do sync before park smpboot threads to take care the rcu boost case.      */     if (IS_ENABLED(CONFIG_PREEMPT))         synchronize_rcu_mult(call_rcu, call_rcu_sched);     else         synchronize_rcu();      smpboot_park_threads(cpu);  將此CPU的由kthread_create創建的線程設置為PARKED。      /*      * Prevent irq alloc/free while the dying cpu reorganizes the      * interrupt affinities.      */     irq_lock_sparse();      /*      * So now all preempt/rcu users must observe !cpu_active().      */     err = stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));     if (err) {         /* CPU didn't die: tell everyone.  Can't complain. */         cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);         irq_unlock_sparse();         goto out_release;     }     BUG_ON(cpu_online(cpu));  如果指定的CPU還處於online狀態,則觸發kernel panic。      /*      * The migration_call() CPU_DYING callback will have removed all      * runnable tasks from the cpu, there's only the idle task left now      * that the migration thread is done doing the stop_machine thing.      *      * Wait for the stop thread to go away.      */     while (!per_cpu(cpu_dead_idle, cpu))         cpu_relax();     smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */     per_cpu(cpu_dead_idle, cpu) = false;      /* Interrupts are moved away from the dying cpu, reenable alloc/free */     irq_unlock_sparse();      hotplug_cpu__broadcast_tick_pull(cpu);     /* This actually kills the CPU. */     __cpu_die(cpu);  調用底層架構相關的cpu_kill回調函數。      /* CPU is completely dead: tell everyone.  Too late to complain. */     tick_cleanup_dead_cpu(cpu);     cpu_notify_nofail(CPU_DEAD | mod, hcpu);  通知完成offline動作的處理器狀態為CPU_DEAD。      check_for_tasks(cpu);  out_release:     cpu_hotplug_done();  釋放cpu_hotplug.lock鎖。     trace_sched_cpu_hotplug(cpu, err, 0);     if (!err)         cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);     return err; }   |  
          
 
          
        
 
         
 
         
 
         
          
           
           |   /* Requires cpu_add_remove_lock to be held */ static int _cpu_up(unsigned int cpu, int tasks_frozen) {     int ret, nr_calls = 0;     void *hcpu = (void *)(long)cpu;     unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;     struct task_struct *idle;      cpu_hotplug_begin();  如果沒有其他進程占有,則退出,執行后面的工作;如果被占用,則設置這個進程為TASK_INTERRUPTIBLE,等待結束。      if (cpu_online(cpu) || !cpu_present(cpu)) {  如果該CPU已經online,則沒有必要執行up;或者非present,則無法up。         ret = -EINVAL;         goto out;     }      idle = idle_thread_get(cpu);  給指定CPU生成一個idle線程     if (IS_ERR(idle)) {         ret = PTR_ERR(idle);         goto out;     }      ret = smpboot_create_threads(cpu);  創建一個用於管理CPU hotplug動作的線程     if (ret)         goto out;      ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);  通知cpu_chain中的處理器,當前正在online的CPU狀態為CPU_UP_PREPARE。     if (ret) {         nr_calls--;         pr_warn("%s: attempt to bring up CPU %u failed\n",             __func__, cpu);         goto out_notify;     }      /* Arch-specific enabling code. */     ret = __cpu_up(cpu, idle);  調用更底層的使能CPU操作。      if (ret != 0)         goto out_notify;     BUG_ON(!cpu_online(cpu));      /* Now call notifier in preparation. */     cpu_notify(CPU_ONLINE | mod, hcpu);  通知cpu_chanin中的處理器,目前online動作的處理器的狀態為CPU_ONLINE。  out_notify:     if (ret != 0)         __cpu_notify(CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL); out:     cpu_hotplug_done();  釋放cpu_hotplug.lock鎖。     trace_sched_cpu_hotplug(cpu, ret, 1);      return ret; }   |  
          
 
          
        
 
         
 
        RCU synchronize
 
        RCU即Read-Copy Update,是Linux內核比較成熟的新型讀寫鎖,具有較高的讀寫並發性能,常常用在需要互斥的關鍵性能路徑。
 
        在Kernel中,有兩種類型實現tiny和tree,tiny rcu更簡潔,常用在小型嵌入式系統中;tree rcu被廣泛用在了server、desktop、android中。
 
        RCU的和心理鏈式讀者訪問的同時,寫者可以更新訪問對象的副本,但寫者需要等待所有讀者完成訪問之后,才能刪除老對象。這個過程實現的關鍵和難點在於如何判斷所有的讀者已經完成訪問。通常把寫者開始更新,到所有讀者完成訪問這段時間叫做寬限期(Grace Period)。內核中實現寬限期等待的函數是synchronize_rcu。
 
        synchronize_rcu_mult同時在call_rcu()函數列表的寬限期上等待,知道所有的都結束。
 
        總結:cpu_chain和rcu sync耗時大部是由外界因素決定的,如果cpu_chain或者call_rcu()列表很多,或者里面回調函數特別耗時,都會拉長CPU_OFF/CPU_ON時間。這部分的優化特別離散。
 
        參考文檔:
 
        RCU synchronize原理分析 http://www.wowotech.net/kernel_synchronization/223.html
 
        synchronize_rcu()函數詳解 http://blog.chinaunix.net/uid-20648784-id-1592811.html
 
        如何確定一個函數耗費時間?
 
        在函數中添加以下ftrace,可以得到執行時的timestamp,進程名稱,函數名和對應的行數。
 
         
          
           
           | trace_suspend_resume(TPS(__func__), __LINE__, true); |  
          
 
          
        
 
         
 
        結果如下:
 
         
          
           
           |   223.502950 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] begin */ 223.502953 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[513] begin */ 223.502957 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[516] begin */ 223.502959 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[522] begin */ 223.502969 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[529] begin */ 223.502973 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[534] begin */ 223.529988 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[544] begin */ 223.530382 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[552] begin */ 223.531451 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[559] begin */ 223.531454 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[563] begin */ 223.531456 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] end */   |  
          
 
          
        
 
        在Excel中打開,可以輕松算出時間差。可知Line 534到Line 544之前耗費了最多時間。
 
        
 
        詳情請參考:
 
        cpu hotplug的流程 http://blog.csdn.net/u013686805/article/details/46942469
 
        Linux CPU core的電源管理(5)_cpu control及cpu hotplug http://www.wowotech.net/pm_subsystem/cpu_hotplug.html
 
        8 參考文檔
 
        Power Management Support in Hikey (suspend-resume):http://www.96boards.org/forums/topic/power-management-support-in-hikey-suspend-resume/#gsc.tab=0
 
        Suspend to Idle:http://www.linaro.org/blog/suspend-to-idle/
 
        Suspend and Resume:https://01.org/zh/suspendresume
 
        SuspendAndResume github:https://github.com/arnoldlu/suspendresume
 
        Linux電源管理(6)_Generic PM之Suspend功能:http://www.wowotech.net/pm_subsystem/suspend_and_resume.html