最近使用TensorFlow C++版本實現神經網絡的部署,我通過GPU 處理得到網絡的輸入值,因此輸入值在GPU內存上保存, TF 輸入tensor 的調用語句為 Tensor inputTensor(DT_FLOAT, TensorShape({ 1,2,3,1 })); 默認構造是將內存放到CPU上的。為了實現GPU 到GPU 的內存拷貝,而不是 GPU 到 Cpu 在從CPU 到GPU(通過PCIE總線內存拷貝耗時高),我們需要將inputTensor 內存初始化到GPU上通過實現下面代碼就可以實現。
#include "tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h" #include "tensorflow/core/common_runtime/gpu/gpu_cudamalloc_allocator.h" tensorflow::GPUBFCAllocator * allocator = new tensorflow::GPUBFCAllocator(0,sizeof(float)* Col_num * tempfftsize); //tensorflow::Allocator* allocator = new AllocatorWrapper(0, tempfftsize * Col_num * sizeof(float)); tensorflow::GPUcudaMallocAllocator *gpu_allocator = new tensorflow::GPUcudaMallocAllocator(gpu_allocator, 0); tensorflow::Tensor inputTensor(gpu_allocator,DT_FLOAT, tensorflow::TensorShape({ 1,Col_num,tempfftsize,1 })); auto inputTensor_flat = inputTensor.flat<float>(); cudaMemcpy(&inputTensor_flat(0), d_LogSpec, tempfftsize * Col_num * sizeof(float), cudaMemcpyDeviceToDevice);//d_LogSpec為輸入的GPU內存地址
更詳細的介紹參考 https://github.com/tensorflow/tensorflow/issues/19283