0. 寫在前面
安裝環境:ubuntu18.04(16和18差不多,但是18太爽了)和python3(具體版本忘了,應該是3.6)
1. 安裝pyCUDA之前必須安裝CUDA
2.安裝pyCUDA
首先用pip3安裝一般服務器會超時,這個時候也可以用清華源或者其他國內源安裝,標准命令是"pip3 install pycuda",但是這種安裝方式我沒試過。
如果用sudo apt install python3-pycuda會發現這個包並不是最新包,除非降低顯卡驅動版本,反正我不願意。
以下是本人實測的方法。
pyCUDA各版本的變化可以參考這個,最新的編譯包貼在這,python.org上也有,github有各版本壓縮包,安裝步驟參考這個的第一步和第三步(如下圖),第二步忽略(一般都是安裝numpy了的),注意對於python3第三步中的"python configure.py ..."改成"python3 ..."
3. 檢驗pyCUDA安裝
新建py腳本如下
import pycuda
import pycuda.driver as drv
drv.init()
print('CUDA device query (PyCUDA version) \n')
print('Detected {} CUDA Capable device(s) \n'.format(drv.Device.count()))
for i in range(drv.Device.count()):
gpu_device = drv.Device(i)
print('Device {}: {}'.format( i, gpu_device.name() ) )
compute_capability = float( '%d.%d' % gpu_device.compute_capability() )
print('\t Compute Capability: {}'.format(compute_capability))
print('\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2)))
# The following will give us all remaining device attributes as seen
# in the original deviceQuery.
# We set up a dictionary as such so that we can easily index
# the values using a string descriptor.
device_attributes_tuples = gpu_device.get_attributes().items()
device_attributes = {}
for k, v in device_attributes_tuples:
device_attributes[str(k)] = v
num_mp = device_attributes['MULTIPROCESSOR_COUNT']
# Cores per multiprocessor is not reported by the GPU!
# We must use a lookup table based on compute capability.
# See the following:
# http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]
print('\t ({}) Multiprocessors, ({}) CUDA Cores / Multiprocessor: {} CUDA Cores'.format(num_mp, cuda_cores_per_mp, num_mp*cuda_cores_per_mp))
device_attributes.pop('MULTIPROCESSOR_COUNT')
for k in device_attributes.keys():
print('\t {}: {}'.format(k, device_attributes[k]))
應該得到輸出結果如下
CUDA device query (PyCUDA version)
Detected 1 CUDA Capable device(s)
Device 0: GeForce GTX 1060
Compute Capability: 6.1
Total Memory: 6078 megabytes
(10) Multiprocessors, (128) CUDA Cores / Multiprocessor: 1280 CUDA Cores
ASYNC_ENGINE_COUNT: 2
CAN_MAP_HOST_MEMORY: 1
CLOCK_RATE: 1733000
COMPUTE_CAPABILITY_MAJOR: 6
COMPUTE_CAPABILITY_MINOR: 1
COMPUTE_MODE: DEFAULT
CONCURRENT_KERNELS: 1
....
....
TEXTURE_PITCH_ALIGNMENT: 32
TOTAL_CONSTANT_MEMORY: 65536
UNIFIED_ADDRESSING: 1
WARP_SIZE: 32