bcc Python開發者教程(譯:bcc Python Developer Tutorial)

本文轉載自查看原文 2021-10-14 20:56 1253 bcc/ 內核/ linux內核/ ebpf/ tools

翻譯自：https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md

bcc Python Developer Tutorial

這個教程主要目的是展示如何使用python來進行bcc工具開發和編程。教程主要分為兩個部分：可觀察性和網絡。

文中的代碼片段均都來自於bcc：代碼片段的licenses見bcc中具體文件。

也可參考bcc開發者手冊reference_guide.md以及end-users工具教程: tutorial.md。此外bcc還開放有lua接口。

Observability

"可觀察性"教程包含17個課程和46個要學習列舉的事項。

Lesson 1. Hello World

我們通過運行examples/hello_world.py這個例子來開啟我們的學習之旅。在一個終端運行這個腳本，同時在另外一個終端運行一些命令(例如"ls")。正常的預期是在新任務運行時打印"Hello, World!"，如果結果不符合預期，說明還有一些東西沒有准備好：參考 INSTALL.md。

# ./examples/hello_world.py
            bash-13364 [002] d... 24573433.052937: : Hello, World!
            bash-13364 [003] d... 24573436.642808: : Hello, World!
[...]

下面是hello_world.py：

from bcc import BPF
BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()

在這里我們有6樣需要學習的東西：

text='...'：定義一個inline BPF 程序. 這個程序使用C語言編碼風格。
kprobe__sys_clone()：這是通過kprobe進行內核函數動態跟蹤的快捷方法。如果C語言函數名稱以"kprobe__"作為前綴，則函數名其余部分則表示將要被跟蹤的內核函數接口(名)，在我們這里的場景中就是跟蹤內核函數sys_clone().
void *ctx： ctx本來是具體類型的參數，但是由於我們這里沒有使用這個參數，因此就將其寫成void *類型。
bpf_trace_printk()：一種將信息輸出到trace_pipe(/sys/kernel/debug/tracing/trace_pipe)簡單機制。在一些簡單用例中這樣使用沒有問題， but它也有一些限制：最多3 參數；第一個參數必須是%s(即字符串)；同時trace_pipe在內核中全局共享，so 其他並行使用trace_pipe的程序有可能會將trace_pipe的輸出擾亂。一個更好的方式是通過BPF_PERF_OUTPUT(), 稍后將會講到。
return 0;：必須這樣，返回0 (如果要知道why, 參考 #139 https://github.com/iovisor/bcc/issues/139)。
.trace_print(): bcc提供的一個功能用以讀取trace_pipe的內容輸出到終端。

Lesson 2. sys_sync()

這一課我們要寫一個跟蹤sys_sync()內核函數的程序。這個程序會在sys_sync()函數被調用時在終端打印"sys_sync() called" 。程序寫好運行起來，並在另外一個終端運行sync命令來進行測試。Lesson 1中的hello_world.py 程序基本上不用怎么修改就夠用。

不過，在Lesson 1的基礎上再增加一條：在跟蹤程序運行時會在第一條打印輸出"Tracing sys_sync()... Ctrl-C to end." 。提示：it's just Python。

Lesson 3. hello_fields.py

這個程序在examples/tracing/hello_fields.py. 示例輸出如下 (命令運行在另外一個終端)：

# ./examples/tracing/hello_fields.py
TIME(s)            COMM             PID    MESSAGE
24585001.174885999 sshd             1432   Hello, World!
24585001.195710000 sshd             15780  Hello, World!
24585001.991976000 systemd-udevd    484    Hello, World!
24585002.276147000 bash             15787  Hello, World!

代碼如下：

from bcc import BPF

# define BPF program
prog = """
int hello(void *ctx) {
    bpf_trace_printk("Hello, World!\\n");
    return 0;
}
"""

# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")

# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))

# format output
while 1:
    try:
        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
    except ValueError:
        continue
    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))

這個程序與hello_world.py相似也是通過sys_clone()來跟蹤新任務，但是添加了一些新的學習事項：

prog =: 這一次我們將C程序定義為了變量，后續通過引用這個變量的方式來使用。如果你想根據命令行參數來進行一些字符串替換，這種方式就很有用。
hello(): 我們定義了一個C語言函數而非kprobe__ 快捷方式，稍后我們將會引用這個函數。所有聲明在BPF程序中的C函數在跟蹤函數的kprobe會被執行，因而這里的C函數需要一個pt_reg* ctx類型的首參。如果你想定義一些helper函數，但是又不希望這些函數在probe時就執行，那么需要將這些helper函數定義為static inline 這樣編譯器可以將其編譯為inlined屬性；有時候也許你需要使用_always_inline 函數屬性來實現這一效果。
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")：為內核的clone系統調用函數添加一個kprobe點，這樣實施后在clone()函數的kprobe會執行我們定義的hello() 函數。也可以多次調用attach_kprobe() 函數在需要跟蹤的內核函數的kprobe點插入你定義的kprobe跟蹤函數。
b.trace_fields()：從trace_pipe返回一組固定字段。類似於trace_print()這個函數一般只在調試時使用，如果在正式發布的工具中應該使用BPF_PERF_OUTPUT()來代替。

Lesson 4. sync_timing.py

還記得嗎，系統管理員層在reboot機器前在終端上連敲了三次sync命令來讓第一次sync同步執行完成? 后來有人覺得sync;sync;sync這種把它們放在一行運行的操作簡直是666，甚至最終都成為了行業慣例，盡管違背了初衷！

接下來的這個列子用以記錄do_sync被頻繁調用的有都快，如果調用間隔小於一秒，則將兩次被調用的時間間隔打印出來。這樣sync;sync;sync一串命令將會輸出第2次和第3次的調用間隔。

# ./examples/tracing/sync_timing.py
Tracing for quick sync's... Ctrl-C to end
At time 0.00 s: multiple syncs detected, last 95 ms ago
At time 0.10 s: multiple syncs detected, last 96 ms ago

這個程序在 examples/tracing/sync_timing.py ：

from __future__ import print_function
from bcc import BPF

# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>

BPF_HASH(last);

int do_trace(struct pt_regs *ctx) {
    u64 ts, *tsp, delta, key = 0;

    // attempt to read stored timestamp
    tsp = last.lookup(&key);
    if (tsp != NULL) {
        delta = bpf_ktime_get_ns() - *tsp;
        if (delta < 1000000000) {
            // output if time is less than 1 second
            bpf_trace_printk("%d\\n", delta / 1000000);
        }
        last.delete(&key);
    }

    // update stored timestamp
    ts = bpf_ktime_get_ns();
    last.update(&key, &ts);
    return 0;
}
""")

b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
print("Tracing for quick sync's... Ctrl-C to end")

# format output
start = 0
while 1:
    (task, pid, cpu, flags, ts, ms) = b.trace_fields()
    if start == 0:
        start = ts
    ts = ts - start
    print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms))

這一課我們要學習如下知識:

bpf_ktime_get_ns(): 返回當前時間戳，以納秒為單位。
BPF_HASH(last): 創建一個名字為"last"的BPF map對象，其本質上是一個hash表。我們沒有指定任何參數，因而這里對map中的key和value都默認為u64類型。
key = 0: 在這個hash map中我們僅存放一對key/value，且key硬編碼為0。
last.lookup(&key): 在hash中通過key查找元素，如果查找到則返回key對應的value指針，否則返回NULL。這里入參傳遞的是key地址。
if (tsp != NULL) {: 內核中的verifier 要求在引用一個返回自map lookup的value指針前必須進行NULL指針檢查。
last.delete(&key): 從hash中刪除key。由於老版本kenrel存在bug因而要求在.update()后需要這樣做，不過這個bug已經在4.8.10后已經fixed。
last.update(&key, &ts): 在hash map中將ts與key進行關聯，這會覆蓋之前的鍵值對中key對應的value，這里是記錄時間戳。

Lesson 5. sync_count.py

對上一節的sync_timing.py 程序進行修改，把內核中sync系統調用的次數(both fast and slow)記錄下來，並在打印中輸出。這個調用次數count可在已有的hash map中新增一個key來記錄。

Lesson 6. disksnoop.py

看看 examples/tracing/disksnoop.py程序找一些新鮮的玩法吧，下面是這個程序的輸出：

# ./disksnoop.py
TIME(s)            T  BYTES    LAT(ms)
16458043.436012    W  4096        3.13
16458043.437326    W  4096        4.44
16458044.126545    R  4096       42.82
16458044.129872    R  4096        3.24
[...]

部分代碼片段如下：

[...]
REQ_WRITE = 1        # from include/linux/blk_types.h

# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>

BPF_HASH(start, struct request *);

void trace_start(struct pt_regs *ctx, struct request *req) {
    // stash start timestamp by request ptr
    u64 ts = bpf_ktime_get_ns();

    start.update(&req, &ts);
}

void trace_completion(struct pt_regs *ctx, struct request *req) {
    u64 *tsp, delta;

    tsp = start.lookup(&req);
    if (tsp != 0) {
        delta = bpf_ktime_get_ns() - *tsp;
        bpf_trace_printk("%d %x %d\\n", req->__data_len,
            req->cmd_flags, delta / 1000);
        start.delete(&req);
    }
}
""")

b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
[...]

這節我們要學習如下知識：

REQ_WRITE: 我們在這個python程序中定義了一個內核中已有的常量，因為稍后我們將會使用到它。如果我們在BPF中使用直接REQ_WRITE即使不定義也應該不會有問題，但前提是要使用正確的頭文件#includes。
trace_start(struct pt_regs *ctx, struct request *req): 這個函數稍后會被attached到kprobes中。Kprobe函數中的首參是struct pt_regs *ctx，這個參數用以提供BPF現場和上下文寄存器；第二個是被kprobe跟蹤的內核函數的實際參數。我們將trace_start()函數attach到blk_start_request()內核函數，而這個內核函數的第一個參數就是struct request *類型。
start.update(&req, &ts): 這里我們使用request 結構指針作為我們hash map的鍵值key。 What? 用指針做key？哈哈，這個在traceing不足為奇。結構體指針在hash map中被證明是很理想的鍵值，因為他們是獨一無二的：兩個結構體(對象)不可能有相同的指針(地址)。(但是要注意內存被釋放后指針被重復使用的情況)。因此這里我們將時間戳timestamp與描述磁盤IO的結構體request struct(指針)進行key/value配對使用，這樣我們就可以對其進行計時。通常有兩種鍵值可用來與時間戳配對存放：結構體指針和線程IDs (for timing function entry to return).
req->__data_len：這里引用struct request的成員。詳情請翻閱內核源碼中這個結構的定義以及它有哪些成員。bcc工具實際上將這些表達式重寫為了一系列bpf_probe_read_kernel() 調用。有時候bcc無法處理一些復雜的引用，此時需要直接調用bpf_probe_read_kernel()。

這個程序非常有意思，如果你能夠理解這里所有的代碼，你就會解許多重要的基礎知識。目前我們仍然使用的是bpf_trace_printk()函數，讓我們接下來繼續改進它吧 !

Lesson 7. hello_perf_output.py

好了，接下來我們不再用前面的bpf_trace_printk()，而是使用BPF_PERF_OUTPUT() 接口這才是正確的打開方式。這也意味着我們無法再歡快而自由的通過 trace_field()獲取到PID和timestamp這些成員字段了，我們不得不自食其力直接取到他們。用例的輸出如下：

# ./hello_perf_output.py
TIME(s)            COMM             PID    MESSAGE
0.000000000        bash             22986  Hello, perf_output!
0.021080275        systemd-udevd    484    Hello, perf_output!
0.021359520        systemd-udevd    484    Hello, perf_output!
0.021590610        systemd-udevd    484    Hello, perf_output!
[...]

代碼取自 examples/tracing/hello_perf_output.py ：

from bcc import BPF

# define BPF program
prog = """
#include <linux/sched.h>

// define output data structure in C
struct data_t {
    u32 pid;
    u64 ts;
    char comm[TASK_COMM_LEN];
};
BPF_PERF_OUTPUT(events);

int hello(struct pt_regs *ctx) {
    struct data_t data = {};

    data.pid = bpf_get_current_pid_tgid();
    data.ts = bpf_ktime_get_ns();
    bpf_get_current_comm(&data.comm, sizeof(data.comm));

    events.perf_submit(ctx, &data, sizeof(data));

    return 0;
}
"""

# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")

# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))

# process event
start = 0
def print_event(cpu, data, size):
    global start
    event = b["events"].event(data)
    if start == 0:
            start = event.ts
    time_s = (float(event.ts - start)) / 1000000000
    print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
        "Hello, perf_output!"))

# loop with callback to print_event
b["events"].open_perf_buffer(print_event)
while 1:
    b.perf_buffer_poll()

本節需要學習：

struct data_t：這是一個我們自己定義的C語言結構體，用於內核向用戶態傳遞數據。
BPF_PERF_OUTPUT(events)：將我們的輸出通道命名為"events"。
struct data_t data = {};：創建一個空的data_t struct結構體對象，其成員在后續填充。
bpf_get_current_pid_tgid()：返回值的低32位保存當前任務的線程(在Linux中內核視角的PID實際上是用戶態的線程ID)，高32位保存線程組ID(也就是用戶態視角的進程PID)。如果用u32類型對這個值進行強轉，高32位將會被截斷discard。我們應該用 PID 還是 TGID呢? 對於多線程應用來說線程組中的TGID都是相同的，因此如果你想要區分的是不同的線程，那么就使用PID。究竟使用PID還是TGID實際上，這是一個與用戶期望有關的問題。
bpf_get_current_comm()：將當前任務的名字(字符串)放到第一個入參中指針所指向的內存中。
events.perf_submit()：提交event以便用戶態通過perf ring buffer讀取perf數據。
def print_event()：自定義的Python函數用以處理event stream讀取的events信息。
b["events"].event(data)：以一個python對象的方式返回events信息，這個python對象是從前面C語言聲明中自動生成的。
b["events"].open_perf_buffer(print_event)：將Python函數print_event 與events stream關聯起來。
while 1: b.perf_buffer_poll()：polling等待perf 事件。

Lesson 8. sync_perf_output.py

使用BPF_PERF_OUTPUT對上一節的sync_timing.py進行重構。

Lesson 9. bitehist.py

下面這個工具以直方圖方式記錄disk I/O大小，示例輸出如下：

# ./bitehist.py
Tracing... Hit Ctrl-C to end.
^C
     kbytes          : count     distribution
       0 -> 1        : 3        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 211      |**********                            |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 0        |                                      |
      64 -> 127      : 1        |                                      |
     128 -> 255      : 800      |**************************************|

下面在來自 examples/tracing/bitehist.py ：

from __future__ import print_function
from bcc import BPF
from time import sleep

# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>

BPF_HISTOGRAM(dist);

int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req)
{
    dist.increment(bpf_log2l(req->__data_len / 1024));
    return 0;
}
""")

# header
print("Tracing... Hit Ctrl-C to end.")

# trace until Ctrl-C
try:
    sleep(99999999)
except KeyboardInterrupt:
    print()

# output
b["dist"].print_log2_hist("kbytes")

讓我們回顧一下前面的課程：

kprobe__：以這個為前綴開表達式中后面的字符串表示要安裝kprobe鈎子的內核函數。
struct pt_regs *ctx, struct request *req：kprobe鈎子函數的參數。參數ctx存着寄存器和BPF的上下文；參數req是被跟蹤內核函數(這里是blk_account_io_done())第一個參數。
req->__data_len：對參數成員進行引用。

需要學習的新東西：

BPF_HISTOGRAM(dist)：定義一個histogram類型BPF map 對象，其名字為"dist"。
dist.increment()：增加dist標記索引，第一個參數表示增加的步長，如果沒有指定參數，默認步長為1。或者也可以自己將第二個參數作為步長。
bpf_log2l()：對參數進行log-2計算，其結果作為直方圖索引，這樣我們可以構建一個2階直方圖。
b["dist"].print_log2_hist("kbytes")：打印"dist"的2階直方圖，以kbytes作為輸出列的頭信息。由於 bucket計數是內核到用戶態傳輸的唯一數據，因此這種方式效率很高。

Lesson 10. disklatency.py

編寫一個程序記錄disk I/O的時間，並打印出這些延遲時間的直方圖。Disk I/O 的監測和計時可參考上一節的disksnoop.py程序，直方圖代碼可以參考上一節的bitehist.py程序。

Lesson 11. vfsreadlat.py

這個例子程序拆分為單獨的Python 和 C 文件。輸出如下：

# ./vfsreadlat.py 1
Tracing... Hit Ctrl-C to end.
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 2        |***********                             |
         4 -> 7          : 7        |****************************************|
         8 -> 15         : 4        |**********************                  |

     usecs               : count     distribution
         0 -> 1          : 29       |****************************************|
         2 -> 3          : 28       |**************************************  |
         4 -> 7          : 4        |*****                                   |
         8 -> 15         : 8        |***********                             |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 2        |**                                      |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 0        |                                        |
      4096 -> 8191       : 4        |*****                                   |
      8192 -> 16383      : 6        |********                                |
     16384 -> 32767      : 9        |************                            |
     32768 -> 65535      : 6        |********                                |
     65536 -> 131071     : 2        |**                                      |

     usecs               : count     distribution
         0 -> 1          : 11       |****************************************|
         2 -> 3          : 2        |*******                                 |
         4 -> 7          : 10       |************************************    |
         8 -> 15         : 8        |*****************************           |
        16 -> 31         : 1        |***                                     |
        32 -> 63         : 2        |*******                                 |
[...]

代碼可參考examples/tracing/vfsreadlat.py 和 examples/tracing/vfsreadlat.c。

需要學習的東西：

b = BPF(src_file = "vfsreadlat.c")：從一個單獨的C語言文件讀取程序。
b.attach_kretprobe(event="vfs_read", fn_name="do_return")：將BPF C函數do_return() 添加到內核函數 vfs_read()的返回點kprobe鈎子中，也就是kretprobe：跟蹤的是一個內核函數的返回點，而非進入點。
b["dist"].clear()：清除histogram.

Lesson 12. urandomread.py

跟蹤dd if=/dev/urandom of=/dev/null bs=8k count=5命令：

# ./urandomread.py
TIME(s)            COMM             PID    GOTBITS
24652832.956994001 smtp             24690  384
24652837.726500999 dd               24692  65536
24652837.727111001 dd               24692  65536
24652837.727703001 dd               24692  65536
24652837.728294998 dd               24692  65536
24652837.728888001 dd               24692  65536

哈！我偶然抓到了smtp。這個例子的代碼在 examples/tracing/urandomread.py：

from __future__ import print_function
from bcc import BPF

# load BPF program
b = BPF(text="""
TRACEPOINT_PROBE(random, urandom_read) {
    // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
    bpf_trace_printk("%d\\n", args->got_bits);
    return 0;
}
""")

# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS"))

# format output
while 1:
    try:
        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
    except ValueError:
        continue
    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))

本節需要學習：

TRACEPOINT_PROBE(random, urandom_read)：跟蹤內核tracepoint函數random:urandom_read。由於tracepoint的API比較穩定，一般不會輕易修改；因而相較於kprobe，只要能夠使用tracepoint就盡量選擇tracepoint跟蹤。可以通過perf list命令來列出有哪些可用的tracepoints。在Linux >= 4.7 的版本中要求將BPF 程序 attach 到tracepoints。
args->got_bits： args 是自動生成的，其類型為tracepoint參數類型數據結構。上面代碼中的注釋提示可以在哪里查找到這個結構體，Eg:

# cat /sys/kernel/debug/tracing/events/random/urandom_read/format
name: urandom_read
ID: 972
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:int got_bits;	offset:8;	size:4;	signed:1;
	field:int pool_left;	offset:12;	size:4;	signed:1;
	field:int input_left;	offset:16;	size:4;	signed:1;

print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left

在這個案例中我們打印的是got_bits成員。

Lesson 13. disksnoop.py fixed版

對前一節的disksnoop.py程序進行修改，這次跟蹤block:block_rq_issue 和 block:block_rq_complete 兩個tracepoints點。

Lesson 14. strlen_count.py

這一課中我們的程序要跟蹤的是strlen()這個用戶態庫函數，並統計這個函數中不同參數出現的頻次。用例輸出如下：

# ./strlen_count.py
Tracing strlen()... Hit Ctrl-C to end.
^C     COUNT STRING
         1 " "
         1 "/bin/ls"
         1 "."
         1 "cpudist.py.1"
         1 ".bashrc"
         1 "ls --color=auto"
         1 "key_t"
[...]
        10 "a7:~# "
        10 "/root"
        12 "LC_ALL"
        12 "en_US.UTF-8"
        13 "en_US.UTF-8"
        20 "~"
        70 "#%^,~:-=?+/}"
       340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# "

這些是這次跟蹤到的該庫函數處理的各種字符串參數，同時還打印了他們出現頻次情況。例如strlen()使用"LC_ALL" 字符串參數調用了12次。

代碼在 examples/tracing/strlen_count.py：

from __future__ import print_function
from bcc import BPF
from time import sleep

# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>

struct key_t {
    char c[80];
};
BPF_HASH(counts, struct key_t);

int count(struct pt_regs *ctx) {
    if (!PT_REGS_PARM1(ctx))
        return 0;

    struct key_t key = {};
    u64 zero = 0, *val;

    bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
    // could also use `counts.increment(key)`
    val = counts.lookup_or_try_init(&key, &zero);
    if (val) {
      (*val)++;
    }
    return 0;
};
""")
b.attach_uprobe(name="c", sym="strlen", fn_name="count")

# header
print("Tracing strlen()... Hit Ctrl-C to end.")

# sleep until Ctrl-C
try:
    sleep(99999999)
except KeyboardInterrupt:
    pass

# print output
print("%10s %s" % ("COUNT", "STRING"))
counts = b.get_table("counts")
for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
    print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))

本節需要學習：

PT_REGS_PARM1(ctx)：這個宏用於獲取被跟蹤函數strlen()的第一個參數，也就是要處理的字符串。
b.attach_uprobe(name="c", sym="strlen", fn_name="count")：Attach "c"庫(if this is the main program, use its pathname)，跟蹤其用戶態函數strlen()，並在strlen()函數執行時調用我們的掛接的uprobe函數count()。

Lesson 15. nodejs_http_server.py

這一次我們將跟蹤用戶態靜態定義tracing (USDT) 探針，這就是用戶態版本的tracepoint。示例輸出如下：

# ./nodejs_http_server.py 24728
TIME(s)            COMM             PID    ARGS
24653324.561322998 node             24728  path:/index.html
24653335.343401998 node             24728  path:/images/welcome.png
24653340.510164998 node             24728  path:/images/favicon.png

相關代碼來自於 examples/tracing/nodejs_http_server.py：

from __future__ import print_function
from bcc import BPF, USDT
import sys

if len(sys.argv) < 2:
    print("USAGE: nodejs_http_server PID")
    exit()
pid = sys.argv[1]
debug = 0

# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
    uint64_t addr;
    char path[128]={0};
    bpf_usdt_readarg(6, ctx, &addr);
    bpf_probe_read_user(&path, sizeof(path), (void *)addr);
    bpf_trace_printk("path:%s\\n", path);
    return 0;
};
"""

# enable USDT probe from given PID
u = USDT(pid=int(pid))
u.enable_probe(probe="http__server__request", fn_name="do_trace")
if debug:
    print(u.get_text())
    print(bpf_text)

# initialize BPF
b = BPF(text=bpf_text, usdt_contexts=[u])

本節需要學習：

bpf_usdt_readarg(6, ctx, &addr)：從USDT probe中讀取第6個參數地址到addr中。
bpf_probe_read_user(&path, sizeof(path), (void *)addr)：現在將addr 指向path。
u = USDT(pid=int(pid))：使用指定的PID初始化USDT tracing。
u.enable_probe(probe="http__server__request", fn_name="do_trace")：將我們的BPF C函數do_trace() 添加到USDT探測點Node.js的http__server__request 處。
b = BPF(text=bpf_text, usdt_contexts=[u])：需要將我們的USDT對象"u"傳入到BPF對象的創建函數中。

Lesson 16. task_switch.c

這是前期課程中已經包含的一部分內容，這里主要是回顧和鞏固一下我們前面已經學過的內容。

這是一個比Hello World稍復雜的跟蹤實例。這個程序會在每次任務切換時被調用，它會把新/老任務的pids記錄到BPF map中。

下面這段C程序引入了一個新概念：參數 prev。這個參數由BCC前端進行特殊處理，因此對該變量的訪問是從kprobe基礎結構傳遞的已保存上下文中獲取。從位置1開始的參數的原型應該與正在被探測的內核函數的原型匹配。這樣一來程序就對函數參數的訪問權無縫對接。

#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

struct key_t {
    u32 prev_pid;
    u32 curr_pid;
};

BPF_HASH(stats, struct key_t, u64, 1024);
int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
    struct key_t key = {};
    u64 zero = 0, *val;

    key.curr_pid = bpf_get_current_pid_tgid();
    key.prev_pid = prev->pid;

    // could also use `stats.increment(key);`
    val = stats.lookup_or_try_init(&key, &zero);
    if (val) {
      (*val)++;
    }
    return 0;
}

用戶態組件加載上面文件的內容並attache到finish_task_switch()內核函數的kprobe探測鈎子中。通過[]這個操作將可以訪問到程序中BPF對象中BPF_HASH元素，這樣就可以直接訪問到內核中的變量。使用這個對象就像python中的其他對象一樣：read, update, and deletes 等等內置的函數都是標准配置。

from bcc import BPF
from time import sleep

b = BPF(src_file="task_switch.c")
b.attach_kprobe(event="finish_task_switch", fn_name="count_sched")

# generate many schedule events
for i in range(0, 100): sleep(0.01)

for k, v in b["stats"].items():
    print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value))

這些程序可以在這些文件中找到 examples/tracing/task_switch.c 以及 examples/tracing/task_switch.py 。

Lesson 17. Further Study

要想進一步學習可參考Sasha Goldshtein的linux-tracing-workshop , 這里還提供了額外的實驗；此外bcc項目的 /tools牧中也有許多工具值得學習。

如果您想為bcc貢獻工具，請閱讀CONTRIBUTING-SCRIPTS.md 文檔。在README.md 文檔的尾部你還可以找到我們的聯系方式。Good luck, and happy tracing!

Networking

To do.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【python】使用python smtplib庫發郵件添加cc，bcc Python 十六進制hex-bytes-str之間的轉換和Bcc碼的生成 python使用SMTP發郵件時使用Cc（抄送）和Bcc（密送）寫給.NET開發者的Python教程(零):引言 BCC和libbpf的轉換 BCC觀測工具的使用 bcc-tools安裝 Ubuntu安裝BCC BCC校驗小知識 BCC校驗（異或和校驗）