按照SystemTap Beginners Guide的Installation and Setup部分安裝了SystemTap,沒想到竟然還有點曲折,在這里紀錄一下。
環境
- Linux發行版本:CentOS Linux release 7.4.1708 (Core)
- 內核版本:3.10.0-693.2.2.el7.x86_64
- uname -a: Linux hostname 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
安裝
安裝SystemTap
先安裝如下兩個RPM包:
-
systemtap
-
systemtap-runtime
以root運行如下命令:
# yum install systemtap systemtap-runtime
在運行SystemTap之間,還需要裝必要的內核信息包。在現代系統上,可以運行如下stap-prep來安裝這些包,如下:
# stap-prep Need to install the following packages: kernel-devel-3.10.0-693.2.2.el7.x86_64 kernel-debuginfo-3.10.0-693.2.2.el7.x86_64 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile No package kernel-devel-3.10.0-693.2.2.el7.x86_64 available. No package kernel-debuginfo-3.10.0-693.2.2.el7.x86_64 available. Error: Nothing to do Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Could not find debuginfo for main pkg: kernel-3.10.0-693.2.2.el7.x86_64 No debuginfo packages available to install package kernel-devel-3.10.0-693.2.2.el7.x86_64 is not installed package kernel-debuginfo-3.10.0-693.2.2.el7.x86_64 is not installed problem installing rpm(s) kernel-devel-3.10.0-693.2.2.el7.x86_64 kernel-debuginfo-3.10.0-693.2.2.el7.x86_64
運行stap-prep的時候,它探測出還要安裝kernel-devel-3.10.0-693.2.2.el7.x86_64包和kernel-debuginfo-3.10.0-693.2.2.el7.x86_64包 (實際上還有kernel-debuginfo-common包),但是自動安裝失敗。我們可以按照如下方法手動安裝。
手動安裝必要的內核信息包
SystemTap需要安裝內核內核符號文件來probe內核。必要的內核信息包含在如下三個包中:
-
kernel-debuginfo
-
kernel-debuginfo-common
-
kernel-devel
一定要安裝與當前內核版本一致的包。當前環境的內核版本是3.10.0-693.2.2.el7.x86_64,所以需要安裝的包為:
- kernel-debuginfo-3.10.0-693.2.2.el7.x86_64
- kernel-debuginfo-common-3.10.0-693.2.2.el7.x86_64
- kernel-devel-3.10.0-693.2.2.el7.x86_64
注意:其實在上一步運行stap-prep時,已經把需要的包的名稱及其內核精准地打印在屏幕上了。
接下來安裝這三個包,注意不要直接yum install kernel-debuginfo kernel-debuginfo-common kernel-devel, 即使能找到相應的包,也是安裝的最新版本,不會自動匹配當前版本。所以我們下載RPM包,再用rpm命令依次安裝。
對於CentOS來說,內核符號文件一版在http://debuginfo.centos.org上有各個版本非常完整的包,但是一般從境內下載都比較慢,特別是kernel-debuginfo,比較大下載可能非常慢。所以在debuginfo.centos.org上下了kernel-debuginfo-common包,另外兩個包在Google上搜了一把,分別找了兩個鏡像。下了之后才發現這個地方有坑,這個坑在后面展開講。
wget https://ftp.sjtu.edu.cn/scientific/7/archive/debuginfo/kernel-debuginfo-3.10.0-693.2.2.el7.x86_64.rpm wget http://debuginfo.centos.org/7/x86_64/kernel-debuginfo-common-x86_64-3.10.0-693.2.2.el7.x86_64.rpm wget ftp://mirror.switch.ch/pool/4/mirror/scientificlinux/7.0/x86_64/updates/security/kernel-devel-3.10.0-693.2.2.el7.x86_64.rpm
下載之后,直接用rpm命令安裝就好:
# rpm -ivh kernel-debuginfo-common-x86_64-3.10.0-693.2.2.el7.x86_64.rpm # rpm -ivh kernel-debuginfo-3.10.0-693.2.2.el7.x86_64.rpm # rpm -ivh kernel-devel-3.10.0-693.2.2.el7.x86_64.rpm
至此安裝步驟完畢,下面來測試SystemTap能不能正常運行。
運行SystemTap
為了測試stap是否能正常運行,用如下簡單命令打印:
# stap -e 'probe begin{printf("Hello, World"); exit();}'
運行失敗...結果如下:
# stap -e 'probe begin{printf("Hello, World"); exit();}' ERROR: module version mismatch (#1 SMP Tue Sep 12 10:10:26 CDT 2017 vs #1 SMP Tue Sep 12 22:26:13 UTC 2017), release 3.10.0-693.2.2.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5]
錯誤信息是:"ERROR: module version mismatch (#1 SMP Tue Sep 1210:10:26 CDT 2017 vs #1 SMP Tue Sep 1222:26:13 UTC 2017)"。
解決"ERROR: module version mismatch"問題
stap運行的時候加上-v參數,打印更多信息看看還有沒有更多線索:
# stap -e 'probe begin{printf("Hello, World"); exit();}' -v Pass 1: parsed user script and 470 library scripts using 228224virt/41280res/3348shr/38020data kb, in 330usr/20sys/346real ms. Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 229148virt/42332res/3536shr/38944data kb, in 0usr/0sys/6real ms. Pass 3: using cached /root/.systemtap/cache/0b/stap_0bc9e27aef7a1de50ea41889a27fc524_1010.c Pass 4: using cached /root/.systemtap/cache/0b/stap_0bc9e27aef7a1de50ea41889a27fc524_1010.ko Pass 5: starting run. ERROR: module version mismatch (#1 SMP Tue Sep 12 10:10:26 CDT 2017 vs #1 SMP Tue Sep 12 22:26:13 UTC 2017), release 3.10.0-693.2.2.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/10sys/38real ms. Pass 5: run failed. [man error::pass5]
查看c文件,vi /root/.systemtap/cache/0b/stap_0bc9e27aef7a1de50ea41889a27fc524_1010.c,搜錯誤信息"module version mismatch",能搜到報錯發生在下面的第13行,至於UTS_RELEASE和UTS_VERSION是在哪里設置的,直接Google一把。
1 #ifndef STP_NO_VERREL_CHECK 2 const char* release = UTS_RELEASE; 3 #ifdef STAPCONF_GENERATED_COMPILE 4 const char* version = UTS_VERSION; 5 #endif 6 might_sleep(); 7 if (strcmp (release, "3.10.0-693.2.2.el7.x86_64")) { 8 _stp_error ("module release mismatch (%s vs %s)", release, "3.10.0-693.2.2.el7.x86_64"); 9 rc = -EINVAL; 10 } 11 #ifdef STAPCONF_GENERATED_COMPILE 12 if (strcmp (utsname()->version, version)) { 13 _stp_error ("module version mismatch (%s vs %s), release %s", version, utsname()->version, release); 14 rc = -EINVAL; 15 } 16 #endif 17 #endif
有兩篇文章里面提到了同樣的坑,文章連接在底部的參考中。在kernel-devel包的所以文件中搜以下變量UTS_VERSION,
# rpm -ql kernel-devel | xargs grep UTS_VERSION /usr/src/kernels/3.10.0-693.2.2.el7.x86_64/include/generated/compile.h:#define UTS_VERSION "#1 SMP Tue Sep 12 10:10:26 CDT 2017"
可以看到在compile.h中有#define UTS_VERSION "#1 SMP Tue Sep 12 10:10:26 CDT 2017". 這個是不是很熟悉... 對比下上面運行stap的報錯信息, module mismatch的時間就是這個。文件compile.h是自動生成的,可能和當時編譯時的時間相關。但是stap要求這個也和當前系統uname -a里面的時間完全一直,如果下個CentOS原生的kernel-devel應該就沒這個問題。
解決問題的另一個簡單方法就是直接修改這個compile.h文件,原來的文件如下:
# cat /usr/src/kernels/3.10.0-693.2.2.el7.x86_64/include/generated/compile.h /* This file is auto generated, version 1 */ /* SMP */ #define UTS_MACHINE "x86_64" #define UTS_VERSION "#1 SMP Tue Sep 12 10:10:26 CDT 2017" #define LINUX_COMPILE_BY "mockbuild" #define LINUX_COMPILE_HOST "sl7-uefisign.fnal.gov" #define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) "
修改define UTS_VERSION那一行,如下:
#define UTS_VERSION "#1 SMP Tue Sep 12 10:10:26 CDT 2017" -> #define UTS_VERSION "#1 SMP Tue Sep 12 22:26:13 UTC 2017"
再次運行stap:
# stap -e 'probe begin{printf("Hello, World"); exit();}' -v Pass 1: parsed user script and 470 library scripts using 228220virt/41276res/3348shr/38016data kb, in 350usr/10sys/355real ms. Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 229144virt/42328res/3536shr/38940data kb, in 0usr/0sys/6real ms. Pass 3: using cached /root/.systemtap/cache/0b/stap_0bc9e27aef7a1de50ea41889a27fc524_1010.c Pass 4: using cached /root/.systemtap/cache/0b/stap_0bc9e27aef7a1de50ea41889a27fc524_1010.ko Pass 5: starting run. ERROR: module version mismatch (#1 SMP Tue Sep 12 10:10:26 CDT 2017 vs #1 SMP Tue Sep 12 22:26:13 UTC 2017), release 3.10.0-693.2.2.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/10sys/38real ms. Pass 5: run failed. [man error::pass5]
因為中間生成的C文件和ko模塊都是用的cache (藍色標注的部分),我們把上面的cache文件刪除,再重新運行,這次可以成功了。
# stap -e 'probe begin{printf("Hello, World"); exit();}' Hello, World
參考
https://sourceware.org/systemtap/SystemTap_Beginners_Guide/using-systemtap.html#using-setup
ERROR: module version mismatch
https://groups.google.com/forum/#!topic/openresty/nlEc3qlDyOc