使用pstack和gdb調試死鎖


1:代碼

         下面是一個簡單的能夠發生死鎖的代碼:

#include <unistd.h> #include <pthread.h> #include <string.h> typedef struct { pthread_mutex_t mutex1; pthread_mutex_t mutex2; int sequence1; int sequence2; }Counter; void* thread1(void* arg) { Counter *cc = (Counter *)arg; while (1) { pthread_mutex_lock(&cc->mutex1); ++cc->sequence1; sleep(1); pthread_mutex_lock(&cc->mutex2); ++cc->sequence2; pthread_mutex_unlock(&cc->mutex2); pthread_mutex_unlock(&cc->mutex1); } } void* thread2(void* arg) { Counter *cc = (Counter *)arg; while (1) { pthread_mutex_lock(&cc->mutex2); ++cc->sequence2; sleep(1); pthread_mutex_lock(&cc->mutex1); ++cc->sequence1; pthread_mutex_unlock(&cc->mutex1); pthread_mutex_unlock(&cc->mutex2); } } int main() { Counter pub_counter = {PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, 0, 0}; pthread_t tid[2]; if (pthread_create(&tid[0], NULL, &thread1, &pub_counter) != 0) { _exit(1); } if (pthread_create(&tid[1], NULL, &thread2, &pub_counter) != 0) { _exit(1); } pthread_join(tid[0], NULL); pthread_join(tid[1], NULL); return 0; }

 

 

2:編譯運行

編譯時加上-g選項,以便能夠得到符號對應的源碼

gcc -o deadlock -g deadlock.c -pthread ./deadlock 

 

 

3:pstack查看調用棧

         使用pstack命令,可以查看正在運行的進程的調用棧:

# ps -ef|grep deadlock root 9867  9032  0 14:13 pts/7    00:00:00 ./deadlock root 9956  8991  0 14:16 pts/5    00:00:00 grep --color=auto deadlock # pstack 9867 Thread 3 (Thread 0x7f6093bf6700 (LWP 9868)): #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1  0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2  0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3  0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:26 #4  0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.0 #5  0x00007f6093cee76d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f60933f5700 (LWP 9869)): #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1  0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2  0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3  0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:44 #4  0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.0 #5  0x00007f6093cee76d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f60943e1740 (LWP 9867)): #0  0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.0 #1  0x0000000000400908 in main () at deadlock.c:66

 

         多運行幾次,發現每次的打印中,線程2和3都卡在__lll_lock_wait函數中,這就是一個明顯的死鎖發生的信號了。

 

4:gdb

4.1 attach到進程

         使用gdb命令,attach到進程上,查看鎖的狀態:

# gdb attach 9867 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.  Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>...
attach: No such file or directory. Attaching to process 9867 Reading symbols from /root/devel/mycode/deadlock...done. Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [New LWP 9869] [New LWP 9868] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.4.x86_64

 

 

4.2查看改進程當前有哪些線程:

(gdb) info thread Id Target Id Frame 3    Thread 0x7f6093bf6700 (LWP 9868) "deadlock" 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0
  2    Thread 0x7f60933f5700 (LWP 9869) "deadlock" 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0
* 1    Thread 0x7f60943e1740 (LWP 9867) "deadlock" 0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.0

 

         *說明當前正在線程1上,需要切換到線程2和線程3上,查看鎖的狀態。

 

先切換到線程2上,並打印調用棧:

(gdb) thread 2 [Switching to thread 2 (Thread 0x7f60933f5700 (LWP 9869))] #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1  0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2  0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3  0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:44 #4  0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.0 #5  0x00007f6093cee76d in clone () from /lib64/libc.so.6

 

         線程2的”PID”為9869。調用棧顯示該線程正阻塞在pthread_mutex_lock上。嘗試看一下鎖的狀態:

(gdb) p cc No symbol "cc" in current context. (gdb) frame 3 #3  0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:44
44                      pthread_mutex_lock(&cc->mutex1); (gdb) p cc $1 = (Counter *) 0x7fffad4cbeb0 (gdb) p cc->mutex1 $2 = {__data = {__lock = 2, __count = 0, __owner = 9868, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\214&\000\000\001", '\000' <repeats 26 times>, __align = 2} (gdb) p cc->mutex2 $3 = {__data = {__lock = 2, __count = 0, __owner = 9869, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\215&\000\000\001", '\000' <repeats 26 times>, __align = 2} (gdb) p cc->sequence1 $4 = 1 (gdb) p cc->sequence2 $5 = 1

 

         因為當前正處於棧幀0上,也就是__lll_lock_wait函數中,因此嘗試打印cc時,會報:No symbol "cc" in current context。因此,首先需要使用frame 3命令,切換到調用pthread_mutex_lock之前的棧幀,然后打印出cc中的各個屬性。

         可見,cc->mutex1當前被”PID”為9868的線程所持有,而cc->mutex2被”PID”為9869的線程,也就是當前線程所持有。

 

         然后,切換到線程3上,然后查看調用棧以及鎖的狀態:

(gdb) thread 3 [Switching to thread 3 (Thread 0x7f6093bf6700 (LWP 9868))] #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0  0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1  0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2  0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3  0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:26 #4  0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.0 #5  0x00007f6093cee76d in clone () from /lib64/libc.so.6 (gdb) f 3 #3  0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:26
26                      pthread_mutex_lock(&cc->mutex2); (gdb) p cc->mutex1 $7 = {__data = {__lock = 2, __count = 0, __owner = 9868, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\214&\000\000\001", '\000' <repeats 26 times>, __align = 2} (gdb) p cc->mutex2 $8 = {__data = {__lock = 2, __count = 0, __owner = 9869, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\215&\000\000\001", '\000' <repeats 26 times>, __align = 2} (gdb) p cc->sequence1 $9 = 1 (gdb) p cc->sequence2 $10 = 1

 

         可見,線程3的”PID”為9868,它就是持有cc->mutex1的線程,而該線程所請求lock的cc->mutex2,目前正被”PID”為9869的線程持有,也就是線程2。

 

5:附注

gdb attach到進程上之后,進程的運行就停止了(不是死掉,只是停止運行),從而可以運行各種GDB命令,查看調用棧,內部變量等:

The first thing gdb does after arranging to debug the specified process is to stop it. You can examine and modify an attached process with all the gdb commands that are ordinarily available when you start processes with run. You can insert breakpoints; you can step and continue; you can modify storage. If you would rather the process continue running, you may use the continue command after attaching gdb to the process.

http://sourceware.org/gdb/onlinedocs/gdb/Attach.html

 

         當使用GDB調試進程時,如果該進程收到了信號,對於不同的信號,GDB會有不同的動作。有些信號會使得GDB將進程停住,或者直接將信號傳遞給進程。可以使用info signals或info handle命令,查看GDB收到信號時采取的動作:

(gdb) info signals Signal Stop Print Pass to program Description SIGHUP Yes Yes Yes Hangup SIGINT Yes Yes No Interrupt SIGQUIT Yes Yes Yes Quit SIGILL Yes Yes Yes Illegal instruction SIGTRAP Yes Yes No Trace/breakpoint trap SIGABRT Yes Yes Yes Aborted SIGEMT Yes Yes Yes Emulation trap SIGFPE Yes Yes Yes Arithmetic exception SIGKILL Yes Yes Yes Killed …

 

         可見,對於SIGINT和SIGTRAP信號,默認情況下GDB會停止進程的運行,並且不將信號傳遞給進程。因此,可以利用這兩個信號,暫停進程的運行,打印調試信息,然后使用continue命令,使進程繼續運行。

 

GDB has the ability to detect any occurrence of a signal in your program. You can tell GDB in advance what to do for each kind of signal.

 

Normally, GDB is set up to let the non-erroneous signals like SIGALRM be silently passed to your program (so as not to interfere with their role in the program’s functioning) but to stop your program immediately whenever an error signal happens. You can change these settings with the handle command.

 

info signals

info handle

Print a table of all the kinds of signals and how GDB has been told to handle each one. You can use this to see the signal numbers of all the defined types of signals.

 

info signals sig

Similar, but print information only about the specified signal number.

 

info handle is an alias for info signals.

 

catch signal [signal… | ‘all’]

Set a catchpoint for the indicated signals. See Set Catchpoints, for details about this command.

 

handle signal [keywords…]

Change the way GDB handles signal signal. The signal can be the number of a signal or its name (with or without the ‘SIG’ at the beginning); a list of signal numbers of the form ‘low-high’; or the word ‘all’, meaning all the known signals. Optional arguments keywords, described below, say what change to make.

 

The keywords allowed by the handle command can be abbreviated. Their full names are:

 

nostop

GDB should not stop your program when this signal happens. It may still print a message telling you that the signal has come in.

 

stop

GDB should stop your program when this signal happens. This implies the print keyword as well.

 

print

GDB should print a message when this signal happens.

 

noprint

GDB should not mention the occurrence of the signal at all. This implies the nostop keyword as well.

 

pass

noignore

GDB should allow your program to see this signal; your program can handle the signal, or else it may terminate if the signal is fatal and not handled. pass and noignore are synonyms.

 

nopass

ignore

GDB should not allow your program to see this signal. nopass and ignore are synonyms.

 

When a signal stops your program, the signal is not visible to the program until you continue. Your program sees the signal then, if pass is in effect for the signal in question at that time. In other words, after GDB reports a signal, you can use the handle command with pass or nopass to control whether your program sees that signal when you continue.

 

The default is set to nostop, noprint, pass for non-erroneous signals such as SIGALRM, SIGWINCH and SIGCHLD, and to stop, print, pass for the erroneous signals.

https://sourceware.org/gdb/current/onlinedocs/gdb/Signals.html#Signals

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM