apex 安裝總結


最近使用一個庫,依賴apex。折騰一個早上才安裝好。做記錄以方便后來者。

環境:
系統: Windows

庫:pytorch1.9.0
cuda版本: 11.1

vs : 2019 

 

vs補充說明,除 vs和默認推薦C++推薦安裝外。遇到問題的時候,臨時裝

 

 

且沒有重啟電腦。理論上應該和apex安裝無關。因為過程發生操作,所以此處也做記錄。

 

1.cuda版本不匹配

庫推薦使用pytorch1.7.1  cuda=10.2   。按照庫給出的說明安裝,提示cuda庫不匹配。

打開 “apex/setup.py” 文件 ,查看代碼 發現 torch的cuda版本(torch_binary_major ,torch_binary_minor)和安裝的cuda驅動版本要一致nvcc(bare_metal_major,bare_metal_minor)

def get_cuda_bare_metal_version(cuda_dir):
    raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
    output = raw_output.split()
    release_idx = output.index("release") + 1
    release = output[release_idx].split(".")
    bare_metal_major = release[0]
    bare_metal_minor = release[1][0]

    return raw_output, bare_metal_major, bare_metal_minor

def check_cuda_torch_binary_vs_bare_metal(cuda_dir):
    raw_output, bare_metal_major, bare_metal_minor = get_cuda_bare_metal_version(cuda_dir)
    torch_binary_major = torch.version.cuda.split(".")[0]
    torch_binary_minor = torch.version.cuda.split(".")[1]

    print("\nCompiling cuda extensions with")
    print(raw_output + "from " + cuda_dir + "/bin\n")

    if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):
        raise RuntimeError("Cuda extensions are being compiled with a version of Cuda that does " +
                           "not match the version used to compile Pytorch binaries.  " +
                           "Pytorch binaries were compiled with Cuda {}.\n".format(torch.version.cuda) +
                           "In some cases, a minor-version mismatch will not cause later errors:  " +
                           "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
                           "You can try commenting out this check (at your own risk).")

解決辦法,cuda和pytorch之間,一者適應另一者 。另外,查看SetUp,py文件,cuda版本>10.0

最終選擇

python:3.7

pytorch安裝命令“”

 

 

2.安裝nvcc

cmd激活命令, 輸入 “nvcc -V” 提示不是系統命令

重新安裝cuda11.1 ,選擇自定義,去除其余,勾選nvcc 。安裝。 
接着設定 nvcc的路徑到系統路徑 。然后參考網上命令 激活Path(正在跑程序,不想重啟電腦)
cmd窗口輸入“nvcc -V” 。結果正常
疑似此處留的坑,當時安裝完沒重啟,可能因此導致后面安裝失敗,直到重啟為止。


3.遇到“Given no hashes to check XXX links for project 'pip': discarding no candidates”錯誤

一直卡在這個提示
1)首先,打開“apex/requirements.txt”,“apex/requirements_dev.txt” ,對照conda list ,安裝缺失的庫。

2)其次,“https://blog.csdn.net/qq_33019383/article/details/103990248” 說要安裝 torch-scatter 。於是安裝。
3)網上說刪除之前下載的“C:\Users\Administrator\apex”文件夾,重新執行如下命令

git clone https://www.github.com/nvidia/apex
cd apex
python3 setup.py install

遺憾的是以上都沒有生效
4.最終解決

重啟電腦。因為前面說的庫,還依賴其它,就順手裝

pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8 diffdist

然后執行

cd apex
python3 setup.py install 

有警告,但安裝成功了。

torch.__version__  = 1.9.0


setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
running install
running bdist_egg
running egg_info
writing apex.egg-info\PKG-INFO
writing dependency_links to apex.egg-info\dependency_links.txt
writing top-level names to apex.egg-info\top_level.txt
reading manifest file 'apex.egg-info\SOURCES.txt'
writing manifest file 'apex.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build\lib
creating build\lib\apex
copying apex\__init__.py -> build\lib\apex
creating build\lib\apex\amp
copying apex\amp\amp.py -> build\lib\apex\amp
copying apex\amp\compat.py -> build\lib\apex\amp
……
copying build\lib\apex\pyprof\nvtx\__init__.py -> build\bdist.win-amd64\egg\apex\pyprof\nvtx
creating build\bdist.win-amd64\egg\apex\pyprof\parse
copying build\lib\apex\pyprof\parse\db.py -> build\bdist.win-amd64\egg\apex\pyprof\parse
……
copying build\lib\apex\RNN\__init__.py -> build\bdist.win-amd64\egg\apex\RNN
copying build\lib\apex\__init__.py -> build\bdist.win-amd64\egg\apex
byte-compiling build\bdist.win-amd64\egg\apex\amp\amp.py to amp.cpython-37.pyc
……
byte-compiling build\bdist.win-amd64\egg\apex\RNN\RNNBackend.py to RNNBackend.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\apex\RNN\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\apex\__init__.py to __init__.cpython-37.pyc
creating build\bdist.win-amd64\egg\EGG-INFO
copying apex.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying apex.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying apex.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying apex.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
zip_safe flag not set; analyzing archive contents...
apex.pyprof.nvtx.__pycache__.nvmarker.cpython-37: module references __file__
apex.pyprof.nvtx.__pycache__.nvmarker.cpython-37: module references __path__
creating dist
creating 'dist\apex-0.1-py3.7.egg' and adding 'build\bdist.win-amd64\egg' to it
removing 'build\bdist.win-amd64\egg' (and everything under it)
Processing apex-0.1-py3.7.egg
creating c:\programdata\anaconda3\envs\XXXX\lib\site-packages\apex-0.1-py3.7.egg
Extracting apex-0.1-py3.7.egg to c:\programdata\anaconda3\envs\XXXX\lib\site-packages
Adding apex 0.1 to easy-install.pth file

Installed c:\programdata\anaconda3\envs\XXXX\lib\site-packages\apex-0.1-py3.7.egg
Processing dependencies for apex==0.1
Finished processing dependencies for apex==0.1

 

5.后續

1)

后面發現執行設定精度設置的語句會報錯,所以實際沒安裝成功。

並且再次執行命令

python setup.py install

 

命令執行,直接換行,沒有執行結果。

改用

python setup.py build
pip install -v --no-cache-dir

 

執行結果

torch.__version__  = 1.9.0


setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

running bdist_wheel
running build
running build_py
installing to build\bdist.win-amd64\wheel
running install
running install_lib
………………………………………………………………………………………………………………………………………………
adding 'apex-0.1.dist-info/WHEEL'
adding 'apex-0.1.dist-info/top_level.txt'
adding 'apex-0.1.dist-info/RECORD'
removing build\bdist.win-amd64\wheel
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pytorch1.8.1\lib\site-packages\colorama\ansitowin32.py", line 59, in closed
return stream.closed
ValueError: underlying buffer has been detached
done
Created wheel for apex: filename=apex-0.1-py3-none-any.whl size=206058 sha256=8761f64146164553df82742b07c5ef2cfe9da3a82a636b9457483cb95a9544ba
Stored in directory: C:\Users\Administrator\AppData\Local\Temp\pip-ephem-wheel-cache-8l21lyri\wheels\17\e2\d0\fbd642567ec1ec2e05aa8db3ae5d45c586c0f909da3f40de6e
Successfully built apex
Installing collected packages: apex

 
         

Successfully installed apex-0.1
1 location(s) to search for versions of pip:
* https://pypi.org/simple/pip/
Fetching project page and analyzing links: https://pypi.org/simple/pip/
Getting page https://pypi.org/simple/pip/
Found index url https://pypi.org/simple
Starting new HTTPS connection (1): pypi.org:443
https://pypi.org:443 "GET /simple/pip/ HTTP/1.1" 200 16538
……………………………………………………………………………………………………………………………………………………………………
Found link https://files.pythonhosted.org/packages/b1/44/6e26d5296b83c6aac166e48470d57a00d3ed1f642e89adc4a4e412a01643/pip-21.1.2.tar.gz#sha256=eb5df6b9ab0af50fe1098a52fd439b04730b6e066887ff7497357b9ebd19f79b (from https://pypi.org/simple/pip/) (requires-python:>=3.6), version: 21.1.2
Skipping link: not a file: https://pypi.org/simple/pip/
Given no hashes to check 167 links for project 'pip': discarding no candidates
Removed build tracker: 'C:\\Users\\Administrator\\AppData\\Local\\Temp\\pip-req-tracker-hs8z7jdp'

 
        

Successfully installed apex-0.1”顯示安裝成功。但是要注意命令沒有安裝cuda拓展和C++拓展。一旦代碼運用到涉及的部分,就會出現問題。

比如:運行swin_Transformer 示例。 會彈警告,提示找不到 “amp_C” 。連鎖反應“

torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)

這一句執行彈出警告,實際執行失敗,沒有完成分布式運算初始化。 進而導致,后續跟分布式有關代碼全部要手動注釋掉(抽樣,訓練時世代設置)

2)

其余安裝方法參考 codebrid的 apex 安裝/使用 記錄

測試參考apex 安裝/使用 記錄

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM