google perftools分析程序性能

本文轉載自查看原文 2013-05-28 19:39 22058 性能測試

Google perftools

1、功能簡介

它的主要功能就是通過采樣的方式，給程序中cpu的使用情況進行“畫像”，通過它所輸出的結果，我們可以對程序中各個函數（得到函數之間的調用關系）耗時情況一目了然。在對程序做性能優化的時候，這個是很重要的，先把最耗時的若干個操作優化好，程序的整體性能提升應該十分明顯，這也是做性能優化的一個最為基本的原則—先優化最耗時的。

2、安裝

1、下載gperftools

Wget https://code.google.com/p/gperftools/downloads/detail?name=gperftools-2.0.tar.gz

2、tar –xzf gperftools-2.0.tar.gz

3、cd gperftools-2.0

4、./configure --prefix=/usr/local –enable-frame-pointers

5、make && make install

ps：編譯時打開了 –enable-frame-pointers ，這要求被測試的程序在編譯時要加上gcc編譯選項，否則某些多線程程序可能會 core:
CCFLAGS=-fno-omit-frame-pointer

ps：perftools對2.4內核的多線程支持不是很好，只能分析主線程，但是2.6內核解決了這個問題。

安裝圖形化分析工具kcachegrind：

kcachegrind用來分析產生的profiling文件，linux環境下使用。

kcachegrind install：sudo apt-get install kcachegrind

3、使用

方法有三種：

1、直接調用提供的api：這種方式比較適用於對於程序的某個局部來做分析的情況，直接在要做分析的局部調用相關的api即可。

方式：調用函數：ProfilerStart() and ProfilerStop()

2、鏈接靜態庫：這種方式是最為常用的方式，后面會有詳細的介紹。

方式：在代碼link過程中添加參數 –lprofiler

For example：gcc […] -o helloworld –lprofiler

運行程序：env CPUPROFILE=./helloworld.prof ./helloworld

指定要profile的程序為helloworld，並且指定產生的分析結果文件的路徑為./helloworld.prof

3、鏈接動態庫：這種方式和靜態庫的方式差不多，但通常不推薦使用，除非使用者不想額外鏈一個靜態庫（因為鏈接靜態庫會增大binary的大小）的情況，可以考慮使用這種方式。

方式：運行時使用LD_PRELOAD，e.g. % env LD_PRELOAD="/usr/lib/libprofiler.so" <binary>（不推薦這種方式）。

Ps：env是linux下插入環境變量的shell命令

4、查看收集數據結果

查看profile結果：pprof工具，它是一個perl的腳本，通過這個工具，可以將google-perftool的輸出結果分析得更為直觀，輸出為圖片、pdf等格式。

Ps：在使用pprof之前需要先安裝運行per15，如果要進行圖標輸出則需要安裝dot，如果需要--gv模式的輸出則需要安裝gv。

調用pprof分析數據文件：

% pprof /bin/ls ls.prof

Enters "interactive" mode

% pprof --text /bin/ls ls.prof

Outputs one line per procedure

% pprof --gv /bin/ls ls.prof

Displays annotated call-graph via 'gv'

% pprof --gv --focus=Mutex /bin/ls ls.prof

Restricts to code paths including a .*Mutex.* entry

% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof

Code paths including Mutex but not string

% pprof --list=getdir /bin/ls ls.prof

(Per-line) annotated source listing for getdir()

% pprof --disasm=getdir /bin/ls ls.prof

(Per-PC) annotated disassembly for getdir()

% pprof --text localhost:1234

Outputs one line per procedure for localhost:1234

% pprof --callgrind /bin/ls ls.prof

Outputs the call information in callgrind format

分析callgrind的輸出：

使用kcachegrind工具來對.callgrind輸出進行分析

e.g. % pprof --callgrind /bin/ls ls.prof > ls.callgrind

% kcachegrind ls.callgrind

4、舉例

事例一：cpu_profiler_example.cpp，在代碼中插入標簽，可以針對某個函數進行特定的profile

代碼如下：

關注兩個函數：ProfilerStart() and ProfilerStop()

Makefile：

-L 動態鏈接庫地址，但是有可能程序執行的時候，找不到動態鏈接庫，所以得

export LD_LIBRARY_PATH=LD_LIBRARY_PATH:"/home/work/alex/tools/gperftools/lib"

1）執行./cpu_profile_example

生成一個性能數據文件: cpu_profiler_example_29502.prof

Ps：當然指定性能數據文件生成的路徑和文件名：

CPUPROFILE=/tmp/profile ./myprogram

將在/tmp目錄下產生profile性能數據文件

2）分析性能數據

pprof -text cpu_profiler_example cpu_profiler_example_3875.prof

Text輸出結果分析：

14 2.1% 17.2% 58 8.7% std::_Rb_tree::find

含義如下：

14：find函數花費了14個profiling samples

2.1%：find函數花費的profiling samples占總的profiling samples的比例

17.2%：到find函數為止，已經運行的函數占總的profiling samples的比例

58：find函數加上find函數里的被調用者總共花費的profiling samples

8.7%：find函數加上find函數里的被調用者總共花費的profiling samples占總的profiling samples的比例

std::_Rb_tree::find：表示profile的函數

ps： 100 samples a second，所以得出的結果除以100，得秒單位

Ldd可以查看一個程序要鏈接那些動態庫：

事例二：cpu_profiler_example.cpp，不需要在代碼里添加任何標簽，將profile所有的函數。

代碼如下：

Makefile：

1）執行程序，生成性能數據文件

CPUPROFILE=/tmp/profile ./cpu_profiler_example

2）分析數據文件

1）pprof -text cpu_profiler_example profile

2）命令行交互模式

事例三：由於我們的程序有可能是服務程序，而服務程序不會自動執行完退出，如果以ctrl+c退出也不是正常的exit(0)的方式退出，而這會導致我們在profile的時候，收集到的數據不全甚至是空的，采用如下解決辦法：

將ProfilerStart和ProfilerStop這2個函數封裝到兩個信號處理函數中，給服務程序發信號SIGUSR1，就開始profile，給服務程序發信號SIGUSR2，就停止profile。這樣我們可以隨時對程序進行profiling，並獲得數據。

代碼如下：

 1 #include <stdio.h>
 2 #include <sys/types.h>
 3 #include <unistd.h>
 4 #include <signal.h>
 5 #include <google/profiler.h>
 6  
 7 //SIGUSR1: start profiling
 8 //SIGUSR2: stop profiling
 9  
10 static void gprof_callback(int signum)
11 {
12     if (signum == SIGUSR1) 
13     {
14         printf("Catch the signal ProfilerStart\n");
15         ProfilerStart("bs.prof");
16     } 
17     else if (signum == SIGUSR2) 
18     {
19         printf("Catch the signal ProfilerStop\n");
20         ProfilerStop();
21     }
22 }
23  
24 static void setup_signal()
25 {
26     struct sigaction profstat;
27     profstat.sa_handler = gprof_callback;
28     profstat.sa_flags = 0;
29     sigemptyset(&profstat.sa_mask);                                        
30     sigaddset(&profstat.sa_mask, SIGUSR1);
31     sigaddset(&profstat.sa_mask, SIGUSR2);
32                                             
33     if ( sigaction(SIGUSR1, &profstat,NULL) < 0 ) 
34     {
35         fprintf(stderr, "Fail to connect signal SIGUSR1 with start profiling");
36     }
37     if ( sigaction(SIGUSR2, &profstat,NULL) < 0 ) 
38     {
39         fprintf(stderr, "Fail to connect signal SIGUSR2 with stop profiling");
40     }
41 }
42  
43 int loopop_callee()
44 {
45     int n=0;
46     for(int i=0; i<10000; i++)
47     {
48         for(int j=0; j<10000; j++)
49         {
50              n |= i%100 + j/100;
51         }
52     }
53     return n;
54 }
55  
56 int loopop()
57 {
58     int n=0;
59     while(1)
60     {
61         for(int i=0; i<10000; i++)
62         {
63             for(int j=0; j<10000; j++)
64             {
65                 n |= i%100 + j/100;
66             }
67         }
68         printf("result:  %d\n", (loopop_callee)() );
69     }
70     return n;
71 }
72  
73 int main(int argc,char** argv)
74 {
75     char program[1024]={0};
76     //snprintf(program,1023,"%s_%d.prof",argv[0],getpid());
77     setup_signal();
78     printf("result:  %d\n", (loopop)() );
79     return 0;
80 }

關注兩個函數gprof_callback和setup_signal。

啟動程序，可以采用kill -s SIGUSR1 5722和kill -s SIGUSR2 5722來開始采集和停止采集，5722是進程pid。

5、心得

最后，補充一點，要用google-perftool來分析程序，必須保證程序能正常退出。

采用kcachegrind查看函數之間依賴，並分析程序性能

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 perf + 火焰圖分析程序性能 CUDA程序性能分析-矩陣乘法使用google-perftools優化nginx內存管理提升性能 WPF程序性能優化程序性能 [golang]7種 Go 程序性能分析方法 Linux下的應用程序性能分析總結【.NET程序性能分析--下篇】使用CLR Profiler分析.NET程序性能優化系列五：程序性能優化 Java程序性能分析工具Java VisualVM（Visual GC）—程序員必備利器