關鍵詞:task_newtask、task_rename、process tree、top等。
有一個場景CPU占用率100%,同時進程下創建了一大堆線程,很多線程同樣的名稱。
而且存在一個情況,top查看所有進程占用率要遠小於100%。
這里有兩個問題,一是線程同名問題,由於程序設計之初沒有考慮線程名問題,導致無法根據把線程名和業務聯系起來;而且通過top/pstree能看到的這是進程和線程的父子關系。
二是,這些丟失的CPU占用率究竟哪里去了?
1. 獲取進程下所有線程的樹形結構
這一步很關鍵,因為很多系統中線程的創建非常頻繁,而且處理完任務,快速退出。
常用的工具top和pstree等,對線程之間的父子關系是沒有顯示的,只明確了進程和線程的父子關系。
這里借助/sys/kernel/debug/tracing/events/task/下的task_newtask和task_rename,這兩者分別創建線程,然后對線程改名。
有了這兩個trace events,就可以記錄新創建的線程,新創建的線程和父線程同名,然后通過task_rename進行名稱修改,常用的是prctl()。
1.1 抓取task_newtask和task_rename數據
通過trace event抓取到的數據如下,task_newtask詳細記錄了線程由誰創建(父線程)、創建了誰(子線程)、名稱是什么(子線程名),然后task_rename詳細記錄了哪個線程由什么舊名稱改成了什么新名稱。
sh-153 [000] .... 53.734695: task_newtask: pid=165 comm=sh clone_flags=1200011 oom_score_adj=0 cat-165 [000] ...1 53.738804: task_rename: pid=165 oldcomm=sh newcomm=cat oom_score_adj=0 sh-153 [000] .... 84.638270: task_newtask: pid=166 comm=sh clone_flags=1200011 oom_score_adj=0 xchip_runtime-166 [000] ...1 84.645910: task_rename: pid=166 oldcomm=sh newcomm=xchip_runtime oom_score_adj=0 xchip_runtime-166 [000] .... 85.098260: task_newtask: pid=167 comm=xchip_runtime clone_flags=3d0f00 oom_score_adj=0 Log2Hostflush-167 [000] ...1 85.099145: task_rename: pid=167 oldcomm=xchip_runtime newcomm=Log2Hostflush oom_score_adj=0 xchip_runtime-166 [000] .... 85.133881: task_newtask: pid=168 comm=xchip_runtime clone_flags=3d0f00 oom_score_adj=0 usb_server_list-168 [000] ...1 85.134077: task_rename: pid=168 oldcomm=xchip_runtime newcomm=usb_server_list oom_score_adj=0 usb_server_list-168 [000] .... 116.497292: task_newtask: pid=169 comm=usb_server_list clone_flags=3d0f00 oom_score_adj=0 coreComm-169 [000] ...1 116.501998: task_rename: pid=169 oldcomm=usb_server_list newcomm=coreComm oom_score_adj=0 usb_server_list-168 [000] .... 116.514232: task_newtask: pid=170 comm=usb_server_list clone_flags=3d0f00 oom_score_adj=0
1.2 腳本分析生成線程樹形結構
在jupyter-notebook中讀取trace.txt文件,然后從中提取關鍵信息,pid、comm、父pid、子pid列表。
以pid為字典鍵值,分析過程如下。
import re process_tree = {} ap_trace_file = open('/home/al/trace.txt', 'rb') for line in ap_trace_file: ftrace_line_fmt = \ ' *(?P<proc>.*)-(?P<pid>[0-9]*) *\[(?P<cpu>[0-9]*)\] *'+\ '(?P<flags>.{4}) *(?P<ktime>[0-9\.]*): *'+\ '(?P<msg>.*)' m = re.match(ftrace_line_fmt, line) if(not m): continue proc = m.group('proc') pid = m.group('pid') msg = m.group('msg') if (not process_tree.has_key(pid)): process_tree[pid] = {'comm':proc, 'child':[], 'father':0} #Analyze task_newtask. #task_newtask: pid=165 comm=sh clone_flags=1200011 oom_score_adj=0 task_newtask_fmt = 'task_newtask: pid=(?P<cpid>[0-9]*) comm=(?P<comm>.*) clone_flags.*' m = re.match(task_newtask_fmt, msg) if(m): cpid = m.group('cpid') comm = m.group('comm') #print pid, cpid, comm #process_tree.update({cpid:{'child':[], 'comm':comm}}) process_tree[pid]['child'].append(cpid) if(not process_tree.has_key(cpid)): process_tree[cpid] = {'comm':comm, 'child':[], 'father':pid} #Analyze task_rename. #task_rename: pid=170 oldcomm=usb_server_list newcomm=adapter oom_score_adj=0 task_rename_fmt = 'task_rename: pid=(?P<cpid>[0-9]*) oldcomm=(?P<oldcomm>.*) newcomm=(?P<newcomm>.*) oom_score_adj.*' m = re.match(task_rename_fmt, msg) if(m): cpid = m.group('cpid') oldcomm = m.group('oldcomm') newcomm = m.group('newcomm') process_tree[cpid]['comm'] = newcomm #print cpid, oldcomm, newcomm ap_trace_file.close()
然后根據process_tree解析結果,顯示成樹形結構:
for key, item in process_tree.items(): if ( item['father'] == 0 ): print item['comm']+':'+key for child1 in item['child']: print " |-2-"+process_tree[child1]['comm']+':'+child1 for child2 in process_tree[child1]['child']: print " |-3-"+process_tree[child2]['comm']+':'+child2 for child3 in process_tree[child2]['child']: print " |-4-"+process_tree[child3]['comm']+':'+child3 for child4 in process_tree[child3]['child']: print " |-5-"+process_tree[child4]['comm']+':'+child4 for child5 in process_tree[child4]['child']: print " |-6-"+process_tree[child5]['comm']+':'+child5 for child6 in process_tree[child5]['child']: print " |-7-"+process_tree[child6]['comm']+':'+child6 for child7 in process_tree[child6]['child']: print " |-8-"+process_tree[child7]['comm']+':'+child7
最終輸出結果如下。
這里可以看出xchip_runtime下所有線程的創建軌跡,以及其父子關系。
sh:153 |-2-cat:165 |-2-xchip_runtime:166 |-3-Log2Hostflush:167 |-3-usb_server_list:168 |-4-coreComm:169 |-4-adapter:170 |-4-usb_server_list:171 |-4-sh:172 |-5-fp_download_tes:173 |-6-sh:174 |-7-find:175 |-7-awk:176 |-6-sh:177 |-7-rm:178 |-4-usb_server_list:179 |-5-usb_server_list:202 |-5-usb_server_list:203 ... |-5-usb_server_list:2851 |-5-usb_server_list:2852 |-4-usb_server_list:180 |-4-p2p_rx_task:181 |-4-IFMS_Init:182 |-4-omx_main:183 |-5-omx_g1_output:218 |-4-src:src:184 |-5-omxdec:src:219 |-4-iccsrc_rx:185 |-4-usb_server_list:186 |-4-p2p_rx_task:187 |-4-omx_main:188 |-5-omx_g1_output:192 |-4-src:src:189 |-5-omxdec:src:193 |-4-iccsrc_rx:190 |-4-usb_server_list:191 |-4-p2p_rx_task:194 |-4-omx_main:195 |-5-omx_g1_output:199 |-4-src:src:196 |-5-omxdec:src:200 |-4-iccsrc_rx:197 |-4-usb_server_list:198 |-4-p2p_rx_task:201 |-4-omx_main:228 |-5-omx_g1_output:240 |-4-src:src:229 |-5-omxdec:src:241 |-4-iccsrc_rx:230 |-4-usb_server_list:235 |-2-top:646 |-2-pidof:1873 |-2-top:1907 |-2-cat:2830
2. 單線程or多線程執行
根據usb_server_list:179線程的創建線程數量和頻率,結合功能就可以判定其對應關系。
線程占用率低於100%的原因,可能是由於usb_server_list創建的子線程頻繁創建並快速退出。
在top采樣周期內並不能完成的存在,這些線程所占用的CPU資源,在總和中得到了統計,但是在top顯示具體進程/線程的時候無法統計到。
為了驗證構造兩個不同的應用,啟動一個周期性timer,頻率可調整。
然后一個應用順序執行四個任務,另一個應用分別啟動四個線程執行四個任務。
通過top查看其CPU占用率。
2.1 測試程序
創建四個線程分別執行任務:
#include <pthread.h> #include <unistd.h> #include <thread> #include <stdio.h> #include <time.h> #include <stdlib.h> #include <signal.h> #include <string.h> unsigned int count = 0; #define LOOP_COUNT 40000 void thread_fn(void) { unsigned int i, a; for(i = 0; i < LOOP_COUNT; i++) { a++; } } void timer_thread(union sigval v) { std::thread thread1, thread2, thread3, thread4; //printf("pthread count=%d.\n", count++); thread1 = std::thread(thread_fn); thread2 = std::thread(thread_fn); thread3 = std::thread(thread_fn); thread4 = std::thread(thread_fn); thread1.join(); thread2.join(); thread3.join(); thread4.join(); } int main(int argc, char** argv) { timer_t timerid = 0; struct itimerspec it; struct sigevent evp; memset(&evp, 0, sizeof(struct sigevent)); evp.sigev_value.sival_int = 111; evp.sigev_notify = SIGEV_THREAD; evp.sigev_notify_function = timer_thread; if (timer_create(CLOCK_REALTIME, &evp, &timerid) == -1) { perror("fail to timer_create"); return -1;; } printf("timer_create timerid = %d\n", timerid); it.it_interval.tv_sec = 0; it.it_interval.tv_nsec = atoi(argv[1]); it.it_value.tv_sec = 1; it.it_value.tv_nsec = 0; if (timer_settime(&timerid, 0, &it, NULL) == -1) { perror("fail to timer_settime"); return -1; } while(1) { sleep(1); } return 0; }
編譯如下:
csky-abiv2-linux-g++ pthread_4simu.cc -o pthread_4simu -lpthread --std=c++11 -lrt
創建順序執行四個任務:
#include <pthread.h> #include <unistd.h> #include <thread> #include <stdio.h> #include <time.h> #include <stdlib.h> #include <signal.h> #include <string.h> unsigned int count = 0; #define LOOP_COUNT 40000 void thread_fn(void) { unsigned int i, a; for(i = 0; i < LOOP_COUNT; i++) { a++; } } void timer_thread(union sigval v) { //printf("loop count=%d.\n", count++); thread_fn(); thread_fn(); thread_fn(); thread_fn(); } int main(int argc, char** argv) { timer_t timerid = 0; struct itimerspec it; struct sigevent evp; memset(&evp, 0, sizeof(struct sigevent)); evp.sigev_value.sival_int = 111; evp.sigev_notify = SIGEV_THREAD; evp.sigev_notify_function = timer_thread; if (timer_create(CLOCK_REALTIME, &evp, &timerid) == -1) { perror("fail to timer_create"); return -1;; } printf("timer_create timerid = %d\n", timerid); it.it_interval.tv_sec = 0; it.it_interval.tv_nsec = atoi(argv[1]); it.it_value.tv_sec = 1; it.it_value.tv_nsec = 0; if (timer_settime(&timerid, 0, &it, NULL) == -1) { perror("fail to timer_settime"); return -1; } while(1) { sleep(1); } return 0; }
編譯如下:
csky-abiv2-linux-g++ loop.cc -o loop -lrt
2.2 測試結果
分別采用不同周期(5ms、10ms、25ms、40ms、100ms),對兩種方式進行測試結果如下。
loop | pthread_4simu | 分析 | |
5ms | ![]() |
![]() |
兩者總的CPU占用率相差20%, |
10ms | ![]() |
![]() |
兩者總的CPU占用率相差18% |
25ms | ![]() |
![]() |
兩者總的CPU占用率相差8% |
40ms | ![]() |
![]() |
兩者總的CPU占用率相差1% |
100ms | ![]() |
![]() |
兩者總的CPU占用率基本持平 |
說明起線程執行任務,開銷還是存在的;在新建線程數量較小情況下,對CPU占用率影響較低。但是隨着新建線程數量增加,對CPU占用率的影響會越來越明顯,嚴重時系統調度能力會下降。
3. 解決方法
針對線程名混亂的情況,在有了線程樹形結構之后,結合代碼比較容易找到入口點,修改線程名稱prctl()/pthread_setname_np()。
修改usb_server_list:179的處理方式,改成單線程處理任務。一是降低系統線程創建銷毀的開銷,二是讓top統計CPU占用率更加准確。
有了這兩者,就可以明確線程占用率和功能對應關系,進入具體線程進行分析。