如何調試Python的segmentation fault錯誤


博主今天下午在工作時,將本來已經在本地運行完好的程序放置到GPU雲上進行訓練,結果卻遇到了意外的錯誤。

這個腳本的作用是將Pascal VOC的數據集做成符合MXNet格式的訓練集,不涉及到GPU的調用,沒想到卻報錯 non-zero exit,具體代碼為-11,如下所示:

tbs@ubuntu:~/workspace/mxnet-ssd$ bash tools/prepare_pascal.sh 
saving list to disk...
List file /home/tbs/workspace/mxnet-ssd/tools/../data/train.lst generated...
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Traceback (most recent call last):
  File "/home/tbs/workspace/mxnet-ssd/tools/prepare_dataset.py", line 111, in <module>
    "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py', '/home/tbs/workspace/mxnet-ssd/data/train.lst', '/home/tbs/workspace/mxnet-ssd/data/VOCdevkit', '--shuffle', '1', '--pack-label', '1']' returned non-zero exit status -11
saving list to disk...
List file /home/tbs/workspace/mxnet-ssd/tools/../data/val.lst generated...
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/val.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Traceback (most recent call last):
  File "/home/tbs/workspace/mxnet-ssd/tools/prepare_dataset.py", line 111, in <module>
    "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py', '/home/tbs/workspace/mxnet-ssd/data/val.lst', '/home/tbs/workspace/mxnet-ssd/data/VOCdevkit', '--shuffle', '1', '--pack-label', '1']' returned non-zero exit status -11

這個non-zero exit,代碼為-11具體是什么錯誤呢?博主在搜索以后也沒有給出具體的答案,不過給出了具體的debug的方式。

這個python的subprocess模塊在調用失敗后,會拋出CalledProcessError異常,並會將錯誤輸出到returncode和output兩個變量中。於是博主寫了如下圖加粗的一段代碼,去打印這個異常,希望能在output變量中發現一些什么,結果令人失望,並沒有更多的信息,output變量輸出結果為None。

try:
    subprocess.check_call(["python",
        os.path.join(curr_path, "..", "mxnet/tools/im2rec.py"),
        os.path.abspath(args.target), os.path.abspath(args.root_path),
        "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
except subprocess.CalledProcessError as e: raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))

看來這樣調試還是找不到具體的錯誤,那么直接運行subprocess的命令吧,結果是Segmentation fault (core dumped).

tbs@ubuntu:~/workspace/mxnet-ssd$ python /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Segmentation fault (core dumped)

繼續搜索發現,可以使用gdb調試這個錯誤,具體的使用方法是,在命令行中輸入gdb,然后使用file python指令加載python環境,接下來使用run ***.py arg1 arg2 ...,這樣調試器gdb就會在遇到的第一個錯誤處停下來。我們觀察這個調試信息,發現最后問題定位在

    0x00007ffff34f4865 in cv::Mat::copyTo(cv::_OutputArray const&) const () from /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4

而我們可以繼續搜索得到,這個問題是因為系統同時裝了OpenCV的2.4版本和3.0版本,或者說python的OpenCV版本與apt包管理的OpenCV版本不一致造成的。最終,博主將python的OpenCV 3.0版本卸載,重新裝了一個2.4版本的,問題解決,終於可以正常訓練了。

tbs@ubuntu:~/workspace/mxnet-ssd$ gdb
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file python
Reading symbols from python...(no debugging symbols found)...done.
(gdb) run /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
Starting program: /usr/bin/python /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3a29700 (LWP 7814)]
[New Thread 0x7ffff3228700 (LWP 7815)]
[New Thread 0x7ffff0a27700 (LWP 7816)]
[New Thread 0x7fffee226700 (LWP 7817)]
[New Thread 0x7fffe9a25700 (LWP 7818)]
[New Thread 0x7fffe7224700 (LWP 7819)]
[New Thread 0x7fffe4a23700 (LWP 7820)]
[New Thread 0x7fffe2222700 (LWP 7821)]
[New Thread 0x7fffdfa21700 (LWP 7822)]
[New Thread 0x7fffdd220700 (LWP 7823)]
[New Thread 0x7fffdaa1f700 (LWP 7824)]
[New Thread 0x7fffd821e700 (LWP 7825)]
[New Thread 0x7fffd5a1d700 (LWP 7826)]
[New Thread 0x7fffd321c700 (LWP 7827)]
[New Thread 0x7fffd0a1b700 (LWP 7828)]
[New Thread 0x7fffce21a700 (LWP 7829)]
[New Thread 0x7fffcba19700 (LWP 7830)]
[New Thread 0x7fffc9218700 (LWP 7831)]
[New Thread 0x7fffc6a17700 (LWP 7832)]
[New Thread 0x7fffc6216700 (LWP 7833)]
[New Thread 0x7fffc5a15700 (LWP 7834)]
[New Thread 0x7fffc5214700 (LWP 7835)]
[New Thread 0x7fffc4a13700 (LWP 7836)]
[New Thread 0x7fffc4212700 (LWP 7837)]
[New Thread 0x7fffc3a11700 (LWP 7838)]
[New Thread 0x7fffc1210700 (LWP 7839)]
[New Thread 0x7fffb2a0f700 (LWP 7840)]
[New Thread 0x7fffb020e700 (LWP 7841)]
[New Thread 0x7fffada0d700 (LWP 7842)]
[New Thread 0x7fffab20c700 (LWP 7843)]
[New Thread 0x7fffa8a0b700 (LWP 7844)]
[New Thread 0x7fffa620a700 (LWP 7845)]
[New Thread 0x7fffa3a09700 (LWP 7846)]
[New Thread 0x7fffa1208700 (LWP 7847)]
[New Thread 0x7fff9ea07700 (LWP 7848)]
[New Thread 0x7fff9c206700 (LWP 7849)]
[New Thread 0x7fff9ba05700 (LWP 7850)]
[New Thread 0x7fff99204700 (LWP 7851)]
[New Thread 0x7fff96a03700 (LWP 7852)]
[New Thread 0x7fff94202700 (LWP 7853)]
[New Thread 0x7fff91a01700 (LWP 7854)]
[New Thread 0x7fff8f200700 (LWP 7855)]
[New Thread 0x7fff8c9ff700 (LWP 7856)]
[New Thread 0x7fff8a1fe700 (LWP 7857)]
[New Thread 0x7fff879fd700 (LWP 7858)]
[New Thread 0x7fff851fc700 (LWP 7859)]
[New Thread 0x7fff829fb700 (LWP 7860)]
[New Thread 0x7fff801fa700 (LWP 7861)]
[New Thread 0x7fff7d9f9700 (LWP 7862)]
[New Thread 0x7fff7b1f8700 (LWP 7863)]
[New Thread 0x7fff789f7700 (LWP 7864)]
[New Thread 0x7fff761f6700 (LWP 7865)]
[New Thread 0x7fff739f5700 (LWP 7866)]
[New Thread 0x7fff711f4700 (LWP 7867)]
[New Thread 0x7fff6e9f3700 (LWP 7868)]
[Thread 0x7fffdfa21700 (LWP 7822) exited]
[Thread 0x7fffc1210700 (LWP 7839) exited]
[Thread 0x7fffa8a0b700 (LWP 7844) exited]
[Thread 0x7fff99204700 (LWP 7851) exited]
[Thread 0x7fff91a01700 (LWP 7854) exited]
[Thread 0x7fff7b1f8700 (LWP 7863) exited]
[Thread 0x7fff739f5700 (LWP 7866) exited]
[Thread 0x7fff6e9f3700 (LWP 7868) exited]
[Thread 0x7fff7d9f9700 (LWP 7862) exited]
[Thread 0x7fff801fa700 (LWP 7861) exited]
[Thread 0x7fff829fb700 (LWP 7860) exited]
[Thread 0x7fff851fc700 (LWP 7859) exited]
[Thread 0x7fff879fd700 (LWP 7858) exited]
[Thread 0x7fff8a1fe700 (LWP 7857) exited]
[Thread 0x7fff94202700 (LWP 7853) exited]
[Thread 0x7fff96a03700 (LWP 7852) exited]
[Thread 0x7fff9ba05700 (LWP 7850) exited]
[Thread 0x7fff9c206700 (LWP 7849) exited]
[Thread 0x7fff9ea07700 (LWP 7848) exited]
[Thread 0x7fffa1208700 (LWP 7847) exited]
[Thread 0x7fffa3a09700 (LWP 7846) exited]
[Thread 0x7fffa620a700 (LWP 7845) exited]
[Thread 0x7fffab20c700 (LWP 7843) exited]
[Thread 0x7fffada0d700 (LWP 7842) exited]
[Thread 0x7fffb020e700 (LWP 7841) exited]
[Thread 0x7fffb2a0f700 (LWP 7840) exited]
[Thread 0x7fffc3a11700 (LWP 7838) exited]
[Thread 0x7fffc4212700 (LWP 7837) exited]
[Thread 0x7fff711f4700 (LWP 7867) exited]
[Thread 0x7fff761f6700 (LWP 7865) exited]
[Thread 0x7fff789f7700 (LWP 7864) exited]
[Thread 0x7fff8c9ff700 (LWP 7856) exited]
[Thread 0x7fff8f200700 (LWP 7855) exited]
[Thread 0x7fffc4a13700 (LWP 7836) exited]
[Thread 0x7fffc5214700 (LWP 7835) exited]
[Thread 0x7fffc5a15700 (LWP 7834) exited]
[Thread 0x7fffc6216700 (LWP 7833) exited]
[Thread 0x7fffc6a17700 (LWP 7832) exited]
[Thread 0x7fffc9218700 (LWP 7831) exited]
[Thread 0x7fffcba19700 (LWP 7830) exited]
[Thread 0x7fffce21a700 (LWP 7829) exited]
[Thread 0x7fffd0a1b700 (LWP 7828) exited]
[Thread 0x7fffd321c700 (LWP 7827) exited]
[Thread 0x7fffd5a1d700 (LWP 7826) exited]
[Thread 0x7fffd821e700 (LWP 7825) exited]
[Thread 0x7fffdaa1f700 (LWP 7824) exited]
[Thread 0x7fffdd220700 (LWP 7823) exited]
[Thread 0x7fffe2222700 (LWP 7821) exited]
[Thread 0x7fffe4a23700 (LWP 7820) exited]
[Thread 0x7fffe7224700 (LWP 7819) exited]
[Thread 0x7fffe9a25700 (LWP 7818) exited]
[Thread 0x7fffee226700 (LWP 7817) exited]
[Thread 0x7ffff0a27700 (LWP 7816) exited]
[Thread 0x7ffff3228700 (LWP 7815) exited]
[Thread 0x7ffff3a29700 (LWP 7814) exited]
[New Thread 0x7fff6e9f3700 (LWP 7871)]
[New Thread 0x7fff711f4700 (LWP 7872)]
[New Thread 0x7fff739f5700 (LWP 7873)]
[New Thread 0x7fff761f6700 (LWP 7874)]
[New Thread 0x7fff462b4700 (LWP 7875)]
[New Thread 0x7fff41ab3700 (LWP 7876)]
[New Thread 0x7fff3f2b2700 (LWP 7877)]
[New Thread 0x7fff3cab1700 (LWP 7878)]
[New Thread 0x7fff3a2b0700 (LWP 7879)]
[New Thread 0x7fff37aaf700 (LWP 7880)]
[New Thread 0x7fff352ae700 (LWP 7881)]
[New Thread 0x7fff32aad700 (LWP 7882)]
[New Thread 0x7fff302ac700 (LWP 7883)]
[New Thread 0x7fff2daab700 (LWP 7884)]
[New Thread 0x7fff2b2aa700 (LWP 7885)]
[New Thread 0x7fff28aa9700 (LWP 7886)]
[New Thread 0x7fff282a8700 (LWP 7887)]
[New Thread 0x7fff25aa7700 (LWP 7888)]
[New Thread 0x7fff212a6700 (LWP 7889)]
[New Thread 0x7fff1eaa5700 (LWP 7890)]
[New Thread 0x7fff1c2a4700 (LWP 7891)]
[New Thread 0x7fff19aa3700 (LWP 7892)]
[New Thread 0x7fff172a2700 (LWP 7893)]
[New Thread 0x7fff14aa1700 (LWP 7894)]
[New Thread 0x7fff122a0700 (LWP 7895)]
[New Thread 0x7fff0fa9f700 (LWP 7896)]
[New Thread 0x7fff0d29e700 (LWP 7897)]
[New Thread 0x7fff0aa9d700 (LWP 7898)]
[New Thread 0x7fff0829c700 (LWP 7899)]
[New Thread 0x7fff05a9b700 (LWP 7900)]
[New Thread 0x7fff0329a700 (LWP 7901)]
[New Thread 0x7fff00a99700 (LWP 7902)]
[New Thread 0x7fff00298700 (LWP 7903)]
[New Thread 0x7ffefda97700 (LWP 7904)]
[New Thread 0x7ffefb296700 (LWP 7905)]
[New Thread 0x7ffef8a95700 (LWP 7906)]
[New Thread 0x7ffef6294700 (LWP 7907)]
[New Thread 0x7ffef3a93700 (LWP 7908)]
[New Thread 0x7ffef1292700 (LWP 7909)]
[New Thread 0x7ffeeea91700 (LWP 7910)]
[New Thread 0x7ffeec290700 (LWP 7911)]
[New Thread 0x7ffee9a8f700 (LWP 7912)]
[New Thread 0x7ffee728e700 (LWP 7913)]
[New Thread 0x7ffee4a8d700 (LWP 7914)]
[New Thread 0x7ffee228c700 (LWP 7915)]
[New Thread 0x7ffedfa8b700 (LWP 7916)]
[New Thread 0x7ffedd28a700 (LWP 7917)]
[New Thread 0x7ffedaa89700 (LWP 7918)]
[New Thread 0x7ffed8288700 (LWP 7919)]
[New Thread 0x7ffed5a87700 (LWP 7920)]
[New Thread 0x7ffed3286700 (LWP 7921)]
[New Thread 0x7ffed0a85700 (LWP 7922)]
[New Thread 0x7ffece284700 (LWP 7923)]
[New Thread 0x7ffecda83700 (LWP 7924)]
[New Thread 0x7ffecd282700 (LWP 7925)]
[New Thread 0x7ffecaa81700 (LWP 7926)]
[Thread 0x7fff00298700 (LWP 7903) exited]
[Thread 0x7ffed5a87700 (LWP 7920) exited]
[Thread 0x7ffed3286700 (LWP 7921) exited]
[Thread 0x7ffed8288700 (LWP 7919) exited]
[Thread 0x7ffecda83700 (LWP 7924) exited]
[Thread 0x7ffece284700 (LWP 7923) exited]
[Thread 0x7ffecaa81700 (LWP 7926) exited]
[Thread 0x7ffecd282700 (LWP 7925) exited]
[Thread 0x7ffed0a85700 (LWP 7922) exited]
[Thread 0x7ffedaa89700 (LWP 7918) exited]
[Thread 0x7ffedd28a700 (LWP 7917) exited]
[Thread 0x7ffedfa8b700 (LWP 7916) exited]
[Thread 0x7ffee228c700 (LWP 7915) exited]
[Thread 0x7ffee4a8d700 (LWP 7914) exited]
[Thread 0x7ffee728e700 (LWP 7913) exited]
[Thread 0x7ffee9a8f700 (LWP 7912) exited]
[Thread 0x7ffeec290700 (LWP 7911) exited]
[Thread 0x7ffeeea91700 (LWP 7910) exited]
[Thread 0x7ffef1292700 (LWP 7909) exited]
[Thread 0x7ffef3a93700 (LWP 7908) exited]
[Thread 0x7ffef6294700 (LWP 7907) exited]
[Thread 0x7ffef8a95700 (LWP 7906) exited]
[Thread 0x7ffefb296700 (LWP 7905) exited]
[Thread 0x7ffefda97700 (LWP 7904) exited]
[Thread 0x7fff00a99700 (LWP 7902) exited]
[Thread 0x7fff0329a700 (LWP 7901) exited]
[Thread 0x7fff05a9b700 (LWP 7900) exited]
[Thread 0x7fff0829c700 (LWP 7899) exited]
[Thread 0x7fff0aa9d700 (LWP 7898) exited]
[Thread 0x7fff0d29e700 (LWP 7897) exited]
[Thread 0x7fff0fa9f700 (LWP 7896) exited]
[Thread 0x7fff122a0700 (LWP 7895) exited]
[Thread 0x7fff14aa1700 (LWP 7894) exited]
[Thread 0x7fff172a2700 (LWP 7893) exited]
[Thread 0x7fff19aa3700 (LWP 7892) exited]
[Thread 0x7fff1c2a4700 (LWP 7891) exited]
[Thread 0x7fff1eaa5700 (LWP 7890) exited]
[Thread 0x7fff212a6700 (LWP 7889) exited]
[Thread 0x7fff25aa7700 (LWP 7888) exited]
[Thread 0x7fff282a8700 (LWP 7887) exited]
[Thread 0x7fff28aa9700 (LWP 7886) exited]
[Thread 0x7fff2b2aa700 (LWP 7885) exited]
[Thread 0x7fff2daab700 (LWP 7884) exited]
[Thread 0x7fff302ac700 (LWP 7883) exited]
[Thread 0x7fff32aad700 (LWP 7882) exited]
[Thread 0x7fff352ae700 (LWP 7881) exited]
[Thread 0x7fff37aaf700 (LWP 7880) exited]
[Thread 0x7fff3a2b0700 (LWP 7879) exited]
[Thread 0x7fff3cab1700 (LWP 7878) exited]
[Thread 0x7fff3f2b2700 (LWP 7877) exited]
[Thread 0x7fff41ab3700 (LWP 7876) exited]
[Thread 0x7fff462b4700 (LWP 7875) exited]
[Thread 0x7fff761f6700 (LWP 7874) exited]
[Thread 0x7fff739f5700 (LWP 7873) exited]
[Thread 0x7fff711f4700 (LWP 7872) exited]
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff34f4865 in cv::Mat::copyTo(cv::_OutputArray const&) const () from /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4
(gdb)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM