一、PyCUDA

當前，PyCUDA 和Theano使用不同的對象來存儲GPU數據。這兩種實現支持的是不同的特征集。 Theano的實現是叫做CudaNdarray ，並且支持strides。 同時只支持float32 dtype。 PyCUDA的實現叫做 GPUArray 而且不支持strides。 然而，它可以處理所有的NumPy 和CUDA dtypes。

我們現在來介紹下如何工作在這兩個都有的基對象上，而且也在模仿NumPy。下面有一些資料關於如何在同一個腳本中使用這兩個對象。

1.1 遷移

你可以使用 theano.misc.pycuda_utils 模塊來對 GPUArray和CudaNdarray之間進行轉換。函數 to_cudandarray(x,copyif=False) 和 to_gpuarray(x) 返回一個新的對象，該對象占據着和原始對象同一塊內存空間。不過它會拋出一個值錯誤（ValueError）的異常。因為 GPUArrays不支持strides，如果CudaNdarray 是strided，那么我們需要對它進行non-strided復制。生成的GPUArray不會在共享同一片內存區域。如果你想要這種行為，那么可以在to_gpuarray中設置 copyif=True 。

1.2 用PyCUDA 來編譯

你可以使用 PyCUDA來編譯直接工作在CudaNdarrays上的 CUDA 函數。這里是來自文件theano/misc/tests/test_pycuda_theano_simple.py中的例子：

import sys
import numpy
import theano
import theano.sandbox.cuda as cuda_ndarray
import theano.misc.pycuda_init
import pycuda
import pycuda.driver as drv
import pycuda.gpuarray


def test_pycuda_theano():
    """Simple example with pycuda function and Theano CudaNdarray object."""
    from pycuda.compiler import SourceModule
    mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

    multiply_them = mod.get_function("multiply_them")

    a = numpy.random.randn(100).astype(numpy.float32)
    b = numpy.random.randn(100).astype(numpy.float32)

    # Test with Theano object
    ga = cuda_ndarray.CudaNdarray(a)
    gb = cuda_ndarray.CudaNdarray(b)
    dest = cuda_ndarray.CudaNdarray.zeros(a.shape)
    multiply_them(dest, ga, gb,
                  block=(400, 1, 1), grid=(1, 1))
    assert (numpy.asarray(dest) == a * b).all()

1.3 Theano 操作，使用一個PyCUDA函數

你可以在theano op中使用用PyCUDA編譯好的GPU函數：

import numpy, theano
import theano.misc.pycuda_init
from pycuda.compiler import SourceModule
import theano.sandbox.cuda as cuda

class PyCUDADoubleOp(theano.Op):
    def __eq__(self, other):
        return type(self) == type(other)
    def __hash__(self):
        return hash(type(self))
    def __str__(self):
        return self.__class__.__name__
    def make_node(self, inp):
        inp = cuda.basic_ops.gpu_contiguous(
           cuda.basic_ops.as_cuda_ndarray_variable(inp))
        assert inp.dtype == "float32"
        return theano.Apply(self, [inp], [inp.type()])
    def make_thunk(self, node, storage_map, _, _2):
        mod = SourceModule("""
    __global__ void my_fct(float * i0, float * o0, int size) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if(i<size){
        o0[i] = i0[i] * 2;
    }
  }""")
        pycuda_fct = mod.get_function("my_fct")
        inputs = [ storage_map[v] for v in node.inputs]
        outputs = [ storage_map[v] for v in node.outputs]
        def thunk():
            z = outputs[0]
            if z[0] is None or z[0].shape!=inputs[0][0].shape:
                z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
            grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
            pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
                       block=(512, 1, 1), grid=grid)
        thunk.lazy = False
        return thunk

二、CUDAMat

這里的函數是用來在CUDAMat對象和 Theano的 CudaNdArray對象之間進行轉換的。它們遵循和theano的PyCUDA函數一樣的原則，可以查閱 theano.misc.cudamat_utils.py.

WARNING: 在這些轉換器上，會有一個與stride/shape相關的特殊的問題。為了能夠work，需要 transpose和reshape.等操作..

三、Gnumpy

這是介於Gnumpy garray 對象和 Theano CudaNdArray 對象之間的轉換函數。也同樣相似於 Theano的 PyCUDA 函數，可查閱： theano.misc.gnumpy_utils.py .

參考資料：

[1] 官網：http://deeplearning.net/software/theano/tutorial/gpu_data_convert.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Theano2.1.10-基礎知識之循環 Theano2.1.12-基礎知識之使用GPU Theano2.1.7-基礎知識之設置的配置和編譯模式 Theano2.1.3-基礎知識之更多的例子 Theano2.1.14-基礎知識之理解為了速度和正確性的內存別名 Theano2.1.16-基礎知識之調試：常見的問題解答 Theano2.1.2-基礎知識之第一步：代數 CPLD/FPGA基礎知識（三）——IO電平兼容 svn基礎知識一 openal 基礎知識