如何调试Python的segmentation fault错误


博主今天下午在工作时,将本来已经在本地运行完好的程序放置到GPU云上进行训练,结果却遇到了意外的错误。

这个脚本的作用是将Pascal VOC的数据集做成符合MXNet格式的训练集,不涉及到GPU的调用,没想到却报错 non-zero exit,具体代码为-11,如下所示:

tbs@ubuntu:~/workspace/mxnet-ssd$ bash tools/prepare_pascal.sh 
saving list to disk...
List file /home/tbs/workspace/mxnet-ssd/tools/../data/train.lst generated...
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Traceback (most recent call last):
  File "/home/tbs/workspace/mxnet-ssd/tools/prepare_dataset.py", line 111, in <module>
    "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py', '/home/tbs/workspace/mxnet-ssd/data/train.lst', '/home/tbs/workspace/mxnet-ssd/data/VOCdevkit', '--shuffle', '1', '--pack-label', '1']' returned non-zero exit status -11
saving list to disk...
List file /home/tbs/workspace/mxnet-ssd/tools/../data/val.lst generated...
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/val.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Traceback (most recent call last):
  File "/home/tbs/workspace/mxnet-ssd/tools/prepare_dataset.py", line 111, in <module>
    "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py', '/home/tbs/workspace/mxnet-ssd/data/val.lst', '/home/tbs/workspace/mxnet-ssd/data/VOCdevkit', '--shuffle', '1', '--pack-label', '1']' returned non-zero exit status -11

这个non-zero exit,代码为-11具体是什么错误呢?博主在搜索以后也没有给出具体的答案,不过给出了具体的debug的方式。

这个python的subprocess模块在调用失败后,会抛出CalledProcessError异常,并会将错误输出到returncode和output两个变量中。于是博主写了如下图加粗的一段代码,去打印这个异常,希望能在output变量中发现一些什么,结果令人失望,并没有更多的信息,output变量输出结果为None。

try:
    subprocess.check_call(["python",
        os.path.join(curr_path, "..", "mxnet/tools/im2rec.py"),
        os.path.abspath(args.target), os.path.abspath(args.root_path),
        "--shuffle", str(int(args.shuffle)), "--pack-label", "1"])
except subprocess.CalledProcessError as e: raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))

看来这样调试还是找不到具体的错误,那么直接运行subprocess的命令吧,结果是Segmentation fault (core dumped).

tbs@ubuntu:~/workspace/mxnet-ssd$ python /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding
Segmentation fault (core dumped)

继续搜索发现,可以使用gdb调试这个错误,具体的使用方法是,在命令行中输入gdb,然后使用file python指令加载python环境,接下来使用run ***.py arg1 arg2 ...,这样调试器gdb就会在遇到的第一个错误处停下来。我们观察这个调试信息,发现最后问题定位在

    0x00007ffff34f4865 in cv::Mat::copyTo(cv::_OutputArray const&) const () from /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4

而我们可以继续搜索得到,这个问题是因为系统同时装了OpenCV的2.4版本和3.0版本,或者说python的OpenCV版本与apt包管理的OpenCV版本不一致造成的。最终,博主将python的OpenCV 3.0版本卸载,重新装了一个2.4版本的,问题解决,终于可以正常训练了。

tbs@ubuntu:~/workspace/mxnet-ssd$ gdb
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file python
Reading symbols from python...(no debugging symbols found)...done.
(gdb) run /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
Starting program: /usr/bin/python /home/tbs/workspace/mxnet-ssd/tools/../mxnet/tools/im2rec.py /home/tbs/workspace/mxnet-ssd/data/train.lst /home/tbs/workspace/mxnet-ssd/data/VOCdevkit --shuffle 1 --pack-label 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3a29700 (LWP 7814)]
[New Thread 0x7ffff3228700 (LWP 7815)]
[New Thread 0x7ffff0a27700 (LWP 7816)]
[New Thread 0x7fffee226700 (LWP 7817)]
[New Thread 0x7fffe9a25700 (LWP 7818)]
[New Thread 0x7fffe7224700 (LWP 7819)]
[New Thread 0x7fffe4a23700 (LWP 7820)]
[New Thread 0x7fffe2222700 (LWP 7821)]
[New Thread 0x7fffdfa21700 (LWP 7822)]
[New Thread 0x7fffdd220700 (LWP 7823)]
[New Thread 0x7fffdaa1f700 (LWP 7824)]
[New Thread 0x7fffd821e700 (LWP 7825)]
[New Thread 0x7fffd5a1d700 (LWP 7826)]
[New Thread 0x7fffd321c700 (LWP 7827)]
[New Thread 0x7fffd0a1b700 (LWP 7828)]
[New Thread 0x7fffce21a700 (LWP 7829)]
[New Thread 0x7fffcba19700 (LWP 7830)]
[New Thread 0x7fffc9218700 (LWP 7831)]
[New Thread 0x7fffc6a17700 (LWP 7832)]
[New Thread 0x7fffc6216700 (LWP 7833)]
[New Thread 0x7fffc5a15700 (LWP 7834)]
[New Thread 0x7fffc5214700 (LWP 7835)]
[New Thread 0x7fffc4a13700 (LWP 7836)]
[New Thread 0x7fffc4212700 (LWP 7837)]
[New Thread 0x7fffc3a11700 (LWP 7838)]
[New Thread 0x7fffc1210700 (LWP 7839)]
[New Thread 0x7fffb2a0f700 (LWP 7840)]
[New Thread 0x7fffb020e700 (LWP 7841)]
[New Thread 0x7fffada0d700 (LWP 7842)]
[New Thread 0x7fffab20c700 (LWP 7843)]
[New Thread 0x7fffa8a0b700 (LWP 7844)]
[New Thread 0x7fffa620a700 (LWP 7845)]
[New Thread 0x7fffa3a09700 (LWP 7846)]
[New Thread 0x7fffa1208700 (LWP 7847)]
[New Thread 0x7fff9ea07700 (LWP 7848)]
[New Thread 0x7fff9c206700 (LWP 7849)]
[New Thread 0x7fff9ba05700 (LWP 7850)]
[New Thread 0x7fff99204700 (LWP 7851)]
[New Thread 0x7fff96a03700 (LWP 7852)]
[New Thread 0x7fff94202700 (LWP 7853)]
[New Thread 0x7fff91a01700 (LWP 7854)]
[New Thread 0x7fff8f200700 (LWP 7855)]
[New Thread 0x7fff8c9ff700 (LWP 7856)]
[New Thread 0x7fff8a1fe700 (LWP 7857)]
[New Thread 0x7fff879fd700 (LWP 7858)]
[New Thread 0x7fff851fc700 (LWP 7859)]
[New Thread 0x7fff829fb700 (LWP 7860)]
[New Thread 0x7fff801fa700 (LWP 7861)]
[New Thread 0x7fff7d9f9700 (LWP 7862)]
[New Thread 0x7fff7b1f8700 (LWP 7863)]
[New Thread 0x7fff789f7700 (LWP 7864)]
[New Thread 0x7fff761f6700 (LWP 7865)]
[New Thread 0x7fff739f5700 (LWP 7866)]
[New Thread 0x7fff711f4700 (LWP 7867)]
[New Thread 0x7fff6e9f3700 (LWP 7868)]
[Thread 0x7fffdfa21700 (LWP 7822) exited]
[Thread 0x7fffc1210700 (LWP 7839) exited]
[Thread 0x7fffa8a0b700 (LWP 7844) exited]
[Thread 0x7fff99204700 (LWP 7851) exited]
[Thread 0x7fff91a01700 (LWP 7854) exited]
[Thread 0x7fff7b1f8700 (LWP 7863) exited]
[Thread 0x7fff739f5700 (LWP 7866) exited]
[Thread 0x7fff6e9f3700 (LWP 7868) exited]
[Thread 0x7fff7d9f9700 (LWP 7862) exited]
[Thread 0x7fff801fa700 (LWP 7861) exited]
[Thread 0x7fff829fb700 (LWP 7860) exited]
[Thread 0x7fff851fc700 (LWP 7859) exited]
[Thread 0x7fff879fd700 (LWP 7858) exited]
[Thread 0x7fff8a1fe700 (LWP 7857) exited]
[Thread 0x7fff94202700 (LWP 7853) exited]
[Thread 0x7fff96a03700 (LWP 7852) exited]
[Thread 0x7fff9ba05700 (LWP 7850) exited]
[Thread 0x7fff9c206700 (LWP 7849) exited]
[Thread 0x7fff9ea07700 (LWP 7848) exited]
[Thread 0x7fffa1208700 (LWP 7847) exited]
[Thread 0x7fffa3a09700 (LWP 7846) exited]
[Thread 0x7fffa620a700 (LWP 7845) exited]
[Thread 0x7fffab20c700 (LWP 7843) exited]
[Thread 0x7fffada0d700 (LWP 7842) exited]
[Thread 0x7fffb020e700 (LWP 7841) exited]
[Thread 0x7fffb2a0f700 (LWP 7840) exited]
[Thread 0x7fffc3a11700 (LWP 7838) exited]
[Thread 0x7fffc4212700 (LWP 7837) exited]
[Thread 0x7fff711f4700 (LWP 7867) exited]
[Thread 0x7fff761f6700 (LWP 7865) exited]
[Thread 0x7fff789f7700 (LWP 7864) exited]
[Thread 0x7fff8c9ff700 (LWP 7856) exited]
[Thread 0x7fff8f200700 (LWP 7855) exited]
[Thread 0x7fffc4a13700 (LWP 7836) exited]
[Thread 0x7fffc5214700 (LWP 7835) exited]
[Thread 0x7fffc5a15700 (LWP 7834) exited]
[Thread 0x7fffc6216700 (LWP 7833) exited]
[Thread 0x7fffc6a17700 (LWP 7832) exited]
[Thread 0x7fffc9218700 (LWP 7831) exited]
[Thread 0x7fffcba19700 (LWP 7830) exited]
[Thread 0x7fffce21a700 (LWP 7829) exited]
[Thread 0x7fffd0a1b700 (LWP 7828) exited]
[Thread 0x7fffd321c700 (LWP 7827) exited]
[Thread 0x7fffd5a1d700 (LWP 7826) exited]
[Thread 0x7fffd821e700 (LWP 7825) exited]
[Thread 0x7fffdaa1f700 (LWP 7824) exited]
[Thread 0x7fffdd220700 (LWP 7823) exited]
[Thread 0x7fffe2222700 (LWP 7821) exited]
[Thread 0x7fffe4a23700 (LWP 7820) exited]
[Thread 0x7fffe7224700 (LWP 7819) exited]
[Thread 0x7fffe9a25700 (LWP 7818) exited]
[Thread 0x7fffee226700 (LWP 7817) exited]
[Thread 0x7ffff0a27700 (LWP 7816) exited]
[Thread 0x7ffff3228700 (LWP 7815) exited]
[Thread 0x7ffff3a29700 (LWP 7814) exited]
[New Thread 0x7fff6e9f3700 (LWP 7871)]
[New Thread 0x7fff711f4700 (LWP 7872)]
[New Thread 0x7fff739f5700 (LWP 7873)]
[New Thread 0x7fff761f6700 (LWP 7874)]
[New Thread 0x7fff462b4700 (LWP 7875)]
[New Thread 0x7fff41ab3700 (LWP 7876)]
[New Thread 0x7fff3f2b2700 (LWP 7877)]
[New Thread 0x7fff3cab1700 (LWP 7878)]
[New Thread 0x7fff3a2b0700 (LWP 7879)]
[New Thread 0x7fff37aaf700 (LWP 7880)]
[New Thread 0x7fff352ae700 (LWP 7881)]
[New Thread 0x7fff32aad700 (LWP 7882)]
[New Thread 0x7fff302ac700 (LWP 7883)]
[New Thread 0x7fff2daab700 (LWP 7884)]
[New Thread 0x7fff2b2aa700 (LWP 7885)]
[New Thread 0x7fff28aa9700 (LWP 7886)]
[New Thread 0x7fff282a8700 (LWP 7887)]
[New Thread 0x7fff25aa7700 (LWP 7888)]
[New Thread 0x7fff212a6700 (LWP 7889)]
[New Thread 0x7fff1eaa5700 (LWP 7890)]
[New Thread 0x7fff1c2a4700 (LWP 7891)]
[New Thread 0x7fff19aa3700 (LWP 7892)]
[New Thread 0x7fff172a2700 (LWP 7893)]
[New Thread 0x7fff14aa1700 (LWP 7894)]
[New Thread 0x7fff122a0700 (LWP 7895)]
[New Thread 0x7fff0fa9f700 (LWP 7896)]
[New Thread 0x7fff0d29e700 (LWP 7897)]
[New Thread 0x7fff0aa9d700 (LWP 7898)]
[New Thread 0x7fff0829c700 (LWP 7899)]
[New Thread 0x7fff05a9b700 (LWP 7900)]
[New Thread 0x7fff0329a700 (LWP 7901)]
[New Thread 0x7fff00a99700 (LWP 7902)]
[New Thread 0x7fff00298700 (LWP 7903)]
[New Thread 0x7ffefda97700 (LWP 7904)]
[New Thread 0x7ffefb296700 (LWP 7905)]
[New Thread 0x7ffef8a95700 (LWP 7906)]
[New Thread 0x7ffef6294700 (LWP 7907)]
[New Thread 0x7ffef3a93700 (LWP 7908)]
[New Thread 0x7ffef1292700 (LWP 7909)]
[New Thread 0x7ffeeea91700 (LWP 7910)]
[New Thread 0x7ffeec290700 (LWP 7911)]
[New Thread 0x7ffee9a8f700 (LWP 7912)]
[New Thread 0x7ffee728e700 (LWP 7913)]
[New Thread 0x7ffee4a8d700 (LWP 7914)]
[New Thread 0x7ffee228c700 (LWP 7915)]
[New Thread 0x7ffedfa8b700 (LWP 7916)]
[New Thread 0x7ffedd28a700 (LWP 7917)]
[New Thread 0x7ffedaa89700 (LWP 7918)]
[New Thread 0x7ffed8288700 (LWP 7919)]
[New Thread 0x7ffed5a87700 (LWP 7920)]
[New Thread 0x7ffed3286700 (LWP 7921)]
[New Thread 0x7ffed0a85700 (LWP 7922)]
[New Thread 0x7ffece284700 (LWP 7923)]
[New Thread 0x7ffecda83700 (LWP 7924)]
[New Thread 0x7ffecd282700 (LWP 7925)]
[New Thread 0x7ffecaa81700 (LWP 7926)]
[Thread 0x7fff00298700 (LWP 7903) exited]
[Thread 0x7ffed5a87700 (LWP 7920) exited]
[Thread 0x7ffed3286700 (LWP 7921) exited]
[Thread 0x7ffed8288700 (LWP 7919) exited]
[Thread 0x7ffecda83700 (LWP 7924) exited]
[Thread 0x7ffece284700 (LWP 7923) exited]
[Thread 0x7ffecaa81700 (LWP 7926) exited]
[Thread 0x7ffecd282700 (LWP 7925) exited]
[Thread 0x7ffed0a85700 (LWP 7922) exited]
[Thread 0x7ffedaa89700 (LWP 7918) exited]
[Thread 0x7ffedd28a700 (LWP 7917) exited]
[Thread 0x7ffedfa8b700 (LWP 7916) exited]
[Thread 0x7ffee228c700 (LWP 7915) exited]
[Thread 0x7ffee4a8d700 (LWP 7914) exited]
[Thread 0x7ffee728e700 (LWP 7913) exited]
[Thread 0x7ffee9a8f700 (LWP 7912) exited]
[Thread 0x7ffeec290700 (LWP 7911) exited]
[Thread 0x7ffeeea91700 (LWP 7910) exited]
[Thread 0x7ffef1292700 (LWP 7909) exited]
[Thread 0x7ffef3a93700 (LWP 7908) exited]
[Thread 0x7ffef6294700 (LWP 7907) exited]
[Thread 0x7ffef8a95700 (LWP 7906) exited]
[Thread 0x7ffefb296700 (LWP 7905) exited]
[Thread 0x7ffefda97700 (LWP 7904) exited]
[Thread 0x7fff00a99700 (LWP 7902) exited]
[Thread 0x7fff0329a700 (LWP 7901) exited]
[Thread 0x7fff05a9b700 (LWP 7900) exited]
[Thread 0x7fff0829c700 (LWP 7899) exited]
[Thread 0x7fff0aa9d700 (LWP 7898) exited]
[Thread 0x7fff0d29e700 (LWP 7897) exited]
[Thread 0x7fff0fa9f700 (LWP 7896) exited]
[Thread 0x7fff122a0700 (LWP 7895) exited]
[Thread 0x7fff14aa1700 (LWP 7894) exited]
[Thread 0x7fff172a2700 (LWP 7893) exited]
[Thread 0x7fff19aa3700 (LWP 7892) exited]
[Thread 0x7fff1c2a4700 (LWP 7891) exited]
[Thread 0x7fff1eaa5700 (LWP 7890) exited]
[Thread 0x7fff212a6700 (LWP 7889) exited]
[Thread 0x7fff25aa7700 (LWP 7888) exited]
[Thread 0x7fff282a8700 (LWP 7887) exited]
[Thread 0x7fff28aa9700 (LWP 7886) exited]
[Thread 0x7fff2b2aa700 (LWP 7885) exited]
[Thread 0x7fff2daab700 (LWP 7884) exited]
[Thread 0x7fff302ac700 (LWP 7883) exited]
[Thread 0x7fff32aad700 (LWP 7882) exited]
[Thread 0x7fff352ae700 (LWP 7881) exited]
[Thread 0x7fff37aaf700 (LWP 7880) exited]
[Thread 0x7fff3a2b0700 (LWP 7879) exited]
[Thread 0x7fff3cab1700 (LWP 7878) exited]
[Thread 0x7fff3f2b2700 (LWP 7877) exited]
[Thread 0x7fff41ab3700 (LWP 7876) exited]
[Thread 0x7fff462b4700 (LWP 7875) exited]
[Thread 0x7fff761f6700 (LWP 7874) exited]
[Thread 0x7fff739f5700 (LWP 7873) exited]
[Thread 0x7fff711f4700 (LWP 7872) exited]
Creating .rec file from /home/tbs/workspace/mxnet-ssd/data/train.lst in /home/tbs/workspace/mxnet-ssd/data
multiprocessing not available, fall back to single threaded encoding

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff34f4865 in cv::Mat::copyTo(cv::_OutputArray const&) const () from /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4
(gdb)

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM