Ftrace的部分使用方法


ftrace主要是用於調試linux kernel調度相關的一個工具,也可用於分析部分kernel性能問題。

相關ftrace的介紹可以參考:kernel/msm-4.9/Documentation/trace/ftrace.txt 以及同級目錄下的其他介紹文件。

kernel.org的ftrace文檔也可以參考(信息較多):https://www.kernel.org/doc/html/latest/trace/ftrace.html

谷歌官網的ftrace使用教程(請自帶梯子):https://source.android.google.cn/devices/tech/debug/ftrace

 IBM developer的blog(介紹每個跟蹤器,並且寫了如何看ftrace的log,很有用,很有用,很有用):https://www.ibm.com/developerworks/cn/linux/l-cn-ftrace2/index.html

一、trace_event

trace event作為ftrace的一部分,它主要通過已有函數插樁的tracepoint,來跟蹤相關事件。

例如,在進程調度關鍵函數:__schedule()中:

static void __sched notrace __schedule(bool preempt)
{
...
    if (likely(prev != next)) {
        if (!prev->on_rq)
            prev->last_sleep_ts = wallclock;

        update_task_ravg(prev, rq, PUT_PREV_TASK, wallclock, 0);
        update_task_ravg(next, rq, PICK_NEXT_TASK, wallclock, 0);
        rq->nr_switches++;
        rq->curr = next;
        ++*switch_count;

        trace_sched_switch(preempt, prev, next);
        rq = context_switch(rq, prev, next, cookie); /* unlocks the rq */
    } else {
        update_task_ravg(prev, rq, TASK_UPDATE, wallclock, 0);
        lockdep_unpin_lock(&rq->lock, cookie);
        raw_spin_unlock_irq(&rq->lock);
    }
...
}

利用trace event,我們能夠監控某些事件,是否發生;或者在某段時間內,發生了什么。

案例:在dim idle下,耗流偏大約10mA。

dim idle是屏幕亮着,但亮度最低,系統進入idle狀態,而未進入深度休眠的狀態。

1、在抓取了power waveform對比正常的機器,發生耗流波形的bottom與正常的相似。但是耗流bottom很不穩定。

(異常)

 

 

 (正常)

 

 

 

2、在檢查所有外設后,並未發現有明顯漏電。

3、此時,就利用ftrace的trace event,抓取在此情況下,系統到底在運行什么。

抓取方法:

Step 1

Connect USB and execute below commands one by one , some times all commands in batch may not work due to adb file system issues
Verify Buffer size is reflected as the one that is set

adb shell "echo 0 > /sys/kernel/debug/tracing/tracing_on"
adb shell "cat /sys/kernel/debug/tracing/tracing_on"
adb shell "echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb"
adb shell "cat /sys/kernel/debug/tracing/buffer_size_kb"

adb shell "echo  > /sys/kernel/debug/tracing/set_event"
adb shell cat /sys/kernel/debug/tracing/set_event
adb shell "echo  > /sys/kernel/debug/tracing/trace"
adb shell cat /sys/kernel/debug/tracing/trace
adb shell sync


Step 2

Enable below trace events

adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_idle/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_frequency/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_frequency_switch_start/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/*/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_enq_deq_task/enable"

adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_bus/bus_update_request/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/clk/clk_set_rate/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/clk/clk_enable/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/clk/clk_disable/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/bw_hwmon_update/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/bw_hwmon_meas/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/memlat_dev_meas/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/memlat_dev_update/enable"

adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_idle/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_frequency/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/power/cpu_frequency_switch_start/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/*/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/sched/sched_enq_deq_task/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cluster_enter/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cluster_exit/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cluster_pred_hist/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cluster_pred_select/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cpu_idle_enter/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cpu_idle_exit/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cpu_power_select/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cpu_pred_hist/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/msm_low_power/cpu_pred_select/enable"

adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_cmdbatch_queued/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_cmdbatch_submitted/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_cmdbatch_retired/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_gpubusy/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_pwr_request_state/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_pwr_set_state/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_pwrstats/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_buslevel/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_pwrlevel/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_clk/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_bus/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_rail/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/dispatch_queue_context/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_user_pwrlevel_constraint/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_clock_throttling/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_constraint/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_gpu_fault/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_cmdbatch_fault/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/adreno_cmdbatch_sync/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_issueibcmds/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_context_create/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_context_destroy/enable"
adb shell "echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_context_detach/enable"
adb shell "echo 1 > /d/tracing/events/msm_bus/bus_update_request/enable"
adb shell "echo 1 > /d/tracing/events/msm_bus/bus_agg_bw/enable"

Step 3 Verify the trace events and also remove any previous trace file
adb shell cat /sys/kernel/debug/tracing/set_event
adb shell rm /data/local/trace.txt


Step 4:  Start Usecase, Execute below commands with usb and disconnect usb within 5 seconds

adb shell
cd /d/tracing
sleep 10 && echo 0 > tracing_on && echo "" > trace && echo 1 > tracing_on && sleep 10 && echo 0 > tracing_on && cat trace > /data/local/trace.txt &

Step 5 : After 20 seconds, re-connect usb and pull the trace file

adb pull /data/local/trace.txt 

4、我們找到了qsee_logger在每60ms,就會運行一次。這個是一個debug qsee的log工具,應該在版本中剔除。

         <idle>-0     [000] dns3 17167.350289: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=000
          <idle>-0     [000] dns3 17167.410357: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=001
          <idle>-0     [001] dns3 17167.470370: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=001
          <idle>-0     [001] dns3 17167.538874: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=001
          <idle>-0     [001] dns3 17167.608888: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=001
          <idle>-0     [001] dns3 17167.671269: sched_wakeup: comm=qsee_logger pid=6004 prio=120 target_cpu=001

5、最后使用performance ROM,驗證pass。再安裝qsee_logger,又出現bottom耗流不穩定的waveform。但是平均耗流仍然會比之前差的時候好。

所以,得出結論:1. qsee_logger導致bottom耗流不穩定,影響耗流

        2. 非performance ROM也影響了耗流

在此案例中,我們利用了trace event查看了系統非休眠情況下,cpu調度的相關事件,並且找出了qsee_logger影響了底電流的穩定。

 

二、trace function 和function_graph

需要使用這2個功能,內核配置中需要添加以下內容:CONFIG_DYNAMIC_FTRACE=y、CONFIG_FUNCTION_TRACER=y、CONFIG_IRQSOFF_TRACER=y、CONFIG_FUNCTION_PROFILER=y 和 CONFIG_PREEMPT_TRACER=y

1)trace function可以設定自己需要監測的函數,並且可以打印出其調用棧。

查看tracer,驗證是否生效。

cat /d/tracing/available_tracers
blk function_graph preemptirqsoff preemptoff irqsoff function nop

1. 由於編譯器有時會對函數進行重命名,因此通過使用以下命令確認(這里用例為:htc_batt_schedule_batt_info_update,看到確實后綴加了.part.8):

cat /d/tracing/available_filter_functions | grep htc_batt_schedule_batt_info_update
htc_batt_schedule_batt_info_update.part.8

2. 確認存在之后,將其用作 ftrace 過濾器:

echo htc_batt_schedule_batt_info_update.part.8 > /d/tracing/set_ftrace_filter

3. 開啟函數分析器:

echo function > /d/tracing/current_tracer

4. 開啟堆棧跟蹤:

echo func_stack_trace > /d/tracing/trace_options

5. 增加緩沖區大小:

echo 64000 > /d/tracing/buffer_size_kb

6. 最后開啟抓取,抓完后關閉,並獲取記錄文件:

echo 1 > /d/tracing/trace_on
執行復現問題的動作
echo 0 > /d/tracing/trace_on
cat /d/tracing/trace > /data/local/tmp/trace

我們可以從抓取的文件中可以看到什么時刻會調用到監測的函數,並會打印出調用棧:

           <...>-7363  [001] ...1  6384.885600: htc_batt_schedule_batt_info_update.part.8 <-htc_battery_info_update
           <...>-7363  [001] ...1  6384.885612: <stack trace>
 => smb2_batt_get_prop
 => power_supply_get_property
 => status_change_work
 => process_one_work
 => worker_thread
 => kthread
 => ret_from_fork
...
...
     kworker/1:2-7034  [001] ...1  6395.535869: htc_batt_schedule_batt_info_update.part.8 <-htc_battery_info_update
     kworker/1:2-7034  [001] ...1  6395.535880: <stack trace>
 => smb2_usb_set_prop
 => power_supply_set_property
 => dwc3_msm_gadget_vbus_draw
 => dwc3_msm_vbus_draw_work
 => process_one_work
 => worker_thread
 => kthread
 => ret_from_fork

2)function_graph 是將調用棧打印出來,並且會顯示每個階段的運行的用時。這對於分析一些內核造成的卡頓問題有所幫助。

在上面那些config基礎上,還需要打開配置:CONFIG_FUNCTION_GRAPH_TRACER=y

1. 設置tracer:

echo function_graph > /d/tracing/current_tracer

2. 最后開啟抓取,抓完后關閉,並獲取記錄文件(設置buffer等和上面類似,不贅述):

echo 1 > /d/tracing/trace_on
執行復現問題的動作
echo 0 > /d/tracing/trace_on
cat /d/tracing/trace > /data/local/tmp/trace

如下,我們可以看到每天函數的執行用時,這個時間是包含了sleep time的,所以可以用來定位系統延遲發生在哪個函數中。:

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0) + 76.615 us   |          } /* vfs_open */
 0)   0.156 us    |          open_check_o_direct();
 0)               |          terminate_walk() {
 0)   0.104 us    |            drop_links.isra.37();
 0)               |            dput() {
 0)               |              __might_sleep() {
 0)   0.156 us    |                ___might_sleep();
 0)   3.750 us    |              }
 0)   0.105 us    |              __rcu_read_lock();
 0)   0.156 us    |              __rcu_read_unlock();
 0) + 17.396 us   |            }
 0)               |            mntput() {
 0)               |              mntput_no_expire() {
 0)   0.156 us    |                __rcu_read_lock();
 0)   0.833 us    |                preempt_count_add();
 0)   0.208 us    |                preempt_count_sub();
 0)   0.157 us    |                __rcu_read_unlock();
 0) + 18.229 us   |              }
 0) + 21.875 us   |            }
 0)               |            dput() {
 0)               |              __might_sleep() {
 0)   0.157 us    |                ___might_sleep();
 0)   3.854 us    |              }
 0)   0.104 us    |              __rcu_read_lock();
 0)   0.156 us    |              __rcu_read_unlock();
 0) + 15.990 us   |            }

其中計時前面的符號是一種提醒,讓我們能注意到哪些地方的用時比較長,可能存在問題:

  + means that the function exceeded 10 usecs.
  ! means that the function exceeded 100 usecs.
  # means that the function exceeded 1000 usecs.
  * means that the function exceeded 10 msecs.
  @ means that the function exceeded 100 msecs.
  $ means that the function exceeded 1 sec.

 

3)動態ftrace

由於上面兩種方法都會監測很多其他函數,會對系統造成較大負擔。所以,需要使用動態過濾監測函數的方式。

如果我們想只跟蹤特定函數,可以把需要配置跟蹤的函數寫入到節點:

echo htc_batt_schedule_batt_info_update.part.8 > /d/tracing/set_ftrace_filter ------- function tracer
echo htc_batt_schedule_batt_info_update.part.8 > /d/tracing/set_graph_function ------ function_graph tracer

或者想把一些函數排除在外,可以寫入如下節點:

/sys/kernel/debug/tracing/set_ftrace_notrace ------- function tracer
/sys/kernel/debug/tracing/set_graph_notrace ------ function_graph tracer

 

BTW,trace抓取之后,想再次抓取,默認是會把上次抓取的也包含在內的。所以,如果不需要上一次的信息,需要清空trace之后,重新再開始抓取:

echo '' > /d/tracing/trace 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM