ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

本文轉載自查看原文 2021-12-28 15:40 1347

MMdetection多卡訓練常遇到的兩個錯誤，百度無果，沒解決，去github里mmdetection的issue模塊搜索了一下找到正解。
這里記錄一下，方便后者。

1️⃣ ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

matplotlib版本過高導致的，卸載你的環境中matplotlib高版本，下載3.2.1版本。親測管用，ubantu18.04

pip uninstall matplotlib
pip install matplotlib==3.2.1

2️⃣ RuntimeError: Address already in use

(mmdet) zdx@zdx-MS:/home/User/gaoying/cv/mmdetection$ bash tools/dist_train.sh work_dirs/mchar/cascade_rcnn_r50_fpn_1x_job1/cascade_rcnn_r50_fpn_1x_job1.py 2

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
  File "tools/train.py", line 185, in <module>
    main()
  File "tools/train.py", line 117, in main
    init_dist(args.launcher, **cfg.dist_params)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
    _init_dist_pytorch(backend, **kwargs)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 32, in _init_dist_pytorch
    dist.init_process_group(backend=backend, **kwargs)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/zdx/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zdx/anaconda3/envs/mmdet/bin/python', '-u', 'tools/train.py', '--local_rank=1', 'work_dirs/mchar/cascade_rcnn_r50_fpn_1x_job1/cascade_rcnn_r50_fpn_1x_job1.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

在一台計算機上多次使用多GPU出現錯誤，把之前運行的都kill掉就好了，具體方法是：
用htop命令查看一下，之前運行命令的pid。如果沒有安裝htop的話，自行百度安裝一下。

htop

點擊Command，按命令進行排序。可以看到我們之前運行的程序的pid為5579。把包含這個命令的都殺死。kill -9表示強制殺死

kill -9 5579

⭐ 又可以愉快的訓練啦

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【debug】Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. matplotlib之__main__:1: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. Git異常：Cannot delete the branch 'test1' which you are currently on ImportError: Imageio Pillow plugin requires Pillow, not PIL! git pull提示You are not currently on a branch. Please specify which The Maven Integration requires that Eclipse be running in a JDK…… Using 1.7 requires compiling with Android 4.4 (KitKat); currently using API 8 ImportError: No module named '_tkinter', please install the python3-tk package You're currently running Fcitx with GUI 錯誤解決 Fcitx ImportError: cannot import name 'sysconfig'