NVCC 的編譯和鏈接

本文轉載自查看原文 2020-02-13 21:30 1311

https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/

1. 編譯：

objects = main.o particle.o v3.o all: $(objects) nvcc -arch=sm_20 $(objects) -o app %.o: %.cpp nvcc -x cu -arch=sm_20 -I. -dc $< -o $@ clean: rm -f *.o app

2 鏈接

nvcc –arch=sm_20 –dlink v3.o particle.o main.o –o gpuCode.o

g++ gpuCode.o main.o particle.o v3.o –lcudart –o app




NVCC 的控制精度的一些編譯選項

--use_fast_math (-use_fast_math)
Make use of fast math library. '--use_fast_math' implies '--ftz=true --prec-div=false
--prec-sqrt=false --fmad=true'.

--ftz {true|false} (-ftz)
This option controls single-precision denormals support. '--ftz=true' flushes
denormal values to zero and '--ftz=false' preserves denormal values. '--use_fast_math'
implies '--ftz=true'.
Default value: false.

--prec-div {true|false} (-prec-div)
This option controls single-precision floating-point division and reciprocals.
'--prec-div=true' enables the IEEE round-to-nearest mode and '--prec-div=false'
enables the fast approximation mode. '--use_fast_math' implies '--prec-div=false'.
Default value: true.

--prec-sqrt {true|false} (-prec-sqrt)
This option controls single-precision floating-point squre root. '--prec-sqrt=true'
enables the IEEE round-to-nearest mode and '--prec-sqrt=false' enables the
fast approximation mode. '--use_fast_math' implies '--prec-sqrt=false'.
Default value: true.

--fmad {true|false} (-fmad)
This option enables (disables) the contraction of floating-point multiplies
and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
or DFMA). '--use_fast_math' implies '--fmad=true'.
Default value: true.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 nvcc編譯器選項及配置 NVCC編譯選項含義解析【caffe編譯】nvcc warning:The 'compute_20', 'sm_20' 編譯OpenCV 4.4的Cuda有時候會nvcc報錯 nvcc、gcc、g++混合編譯器編程編譯和鏈接 caffe編譯過程中的錯誤： nvcc fatal : Unsupported gpu architecture 'compute_20' 編譯dcn v2 command ':/usr/local/cuda/bin/nvcc' failed with exit status 1 cuda9.0編譯caffe報錯nvcc fatal : Unsupported gpu architecture 'compute_70' nvidia jetson xavier NX編譯darknet, darknet /bin/sh: 1: nvcc: not found