http://blog.chinaunix.net/uid-25909722-id-3011815.html
在用pthread函數庫實現一個線程池的過程中,遇到了幾個小小的問題:
(2)pthread_cancel使用不當引起的SIGSEGV/Segmentation fault
具體的情況為:
在線程池中有兩類線程:work_thread和manager_thread。前者是工作線程,后者是管理線程。其中管理線程只有一個。
管理線程的實現中調用了一個函數:pool_delete_thread(),來定期清理線程池中的空閑線程,也就是對過量的空閑線程調用pthread_cancel()函數。一般在線程池負載有大變小的時候,進行清理工作。
同時,線程池中有一個關閉線程池的函數close_pool()的函數。該函數一般只在程序結束時調用。
而close_pool()的實現為:將所有調用了pthread_cond_wait的處於等待的空閑線程喚醒,然后調用pthread_cancel()將它們殺掉。
這樣問題就來了:
每當調用了close_pool()函數之后,如果管理線程再調用了pool_delete_thread()函數,就會發生SIGSEGV錯誤:
- digdeep@ubuntu:~/pthread/threadpool$ gdb -c core ./threadPoolTest
- GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
- Copyright (C) 2010 Free Software Foundation, Inc.
- License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
- This is free software: you are free to change and redistribute it.
- There is NO WARRANTY, to the extent permitted by law. Type "show copying"
- and "show warranty" for details.
- This GDB was configured as "i686-linux-gnu".
- For bug reporting instructions, please see:
- <http://www.gnu.org/software/gdb/bugs/>...
- Reading symbols from /home/digdeep/pthread/threadpool/threadPoolTest...(no debugging symbols found)...done.
- [New Thread 6499]
- [New Thread 6500]
- [New Thread 6492]
- [New Thread 6501]
- warning: Can't read pathname for load map: Input/output error.
- Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/libpthread-2.13.so...done.
- done.
- Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
- Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/libc-2.13.so...done.
- done.
- Loaded symbols for /lib/i386-linux-gnu/libc.so.6
- Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
- Loaded symbols for /lib/ld-linux.so.2
- Core was generated by `./threadPoolTest'.
- Program terminated with signal 11, Segmentation fault.
- #0 0x00d276f0 in pthread_cancel (th=3077909360) at pthread_cancel.c:35
- 35 pthread_cancel.c: No such file or directory.
- in pthread_cancel.c
- (gdb) bt
- #0 0x00d276f0 in pthread_cancel (th=3077909360) at pthread_cancel.c:35
- #1 0x08048ffc in pool_delete_thread ()
- #2 0x08049223 in manage_thread ()
- #3 0x00d21e99 in start_thread (arg=0xb474cb70) at pthread_create.c:304
- #4 0x001e073e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
最后的解決辦法是:
在調用了close_pool()函數之后,設置一個stop_flag,然后在管理線程中的來判斷stop_flag的值是否被賦值,如果被賦值,則不要調用pool_delete_thread。實際上,當主線程調用了close_pool()函數之后,管理線程也沒有必要在調用了pool_delete_thread()函數。
在網上搜索找到的相關資料:
Linux的Native POSIXThread Library的實現,有一個race condition,表現出來的現象是,對一個正要結束的線程調用pthread_cancel()的時候,會隨機的收到SIGSEGV。這個問題在UNIX各個版本,如Solaris,HP-UX,AIX上面都沒有。換一個思路,通過pthread_kill來測試線程是否存在,然后再進行相應的動作,這樣應該可以避免出現向正在結束的線程調用pthread_cancel()。但是pthread_kill()也出現了SIGSEGV,真是ft,也是一樣的原因。
解決方法:
1、使用pthread_mutex和pthread_cond系列函數進行同步,避免Linux NPTL中的這個race condition。
2、增加一個狀態機制,用一個全局的表來存儲每個線程的狀態,當線程結束的時候,將表中相應的狀態從RUNNING置為DEAD;主線程不斷的check那個狀態表就可以了,有些dirty;-)
參考:
http://blog.chinaunix.net/u/13667/showart_222280.html
http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2004-04/0632.html
http://www.9php.com/FAQ/cxsjl/c/2008/01/6564294109336.html