https://codeyarns.com/2011/03/02/how-to-do-error-checking-in-cuda/
Error checks in CUDA code can help catch CUDA errors at their source. There are 2 sources of errors in CUDA source code:
- Errors from CUDA API calls. For example, a call to
cudaMalloc()
might fail. - Errors from CUDA kernel calls. For example, there might be invalid memory access inside a kernel
在CUDA代碼里,錯誤檢查可以幫助找到CUDA代碼里的錯誤,有兩種從代碼里產生的錯誤
- CUDA API調用錯誤。如,一個cudaMalloc()調用可能會失敗。
- CUDA kernel調用錯誤。如,可能會在某個kernel的實現了訪問了非法的內存。
All CUDA API calls return a cudaError
value, so these calls are easy to check:
所有CUDA API調用都會返回一個cudaError值,所以這種調用非常容易檢查。
if ( cudaSuccess != cudaMalloc( &fooPtr, fooSize ) ) printf( "Error!\n" );
CUDA kernel invocations do not return any value. Error from a CUDA kernel call can be checked after its execution by calling cudaGetLastError()
:
CUDA kernel不返回任何值。從CUDA kernel調用產生的錯誤可以在該調用完畢后,從cudaGetLastError()中檢查到。
fooKernel<<< x, y >>>(); // Kernel call if ( cudaSuccess != cudaGetLastError() ) printf( "Error!\n" );
These two types of checks can be elegantly wrapped up in two simple error-checking functions like this:
這兩種檢查可以非常優雅地封裝在兩個錯誤檢查函數中,如下,
// Define this to turn on error checking #define CUDA_ERROR_CHECK #define CudaSafeCall( err ) __cudaSafeCall( err, __FILE__, __LINE__ ) #define CudaCheckError() __cudaCheckError( __FILE__, __LINE__ ) inline void __cudaSafeCall( cudaError err, const char *file, const int line ) { #ifdef CUDA_ERROR_CHECK if ( cudaSuccess != err ) { fprintf( stderr, "cudaSafeCall() failed at %s:%i : %s\n", file, line, cudaGetErrorString( err ) ); exit( -1 ); } #endif return; } inline void __cudaCheckError( const char *file, const int line ) { #ifdef CUDA_ERROR_CHECK cudaError err = cudaGetLastError(); if ( cudaSuccess != err ) { fprintf( stderr, "cudaCheckError() failed at %s:%i : %s\n", file, line, cudaGetErrorString( err ) ); exit( -1 ); } // More careful checking. However, this will affect performance. // Comment away if needed. err = cudaDeviceSynchronize(); if( cudaSuccess != err ) { fprintf( stderr, "cudaCheckError() with sync failed at %s:%i : %s\n", file, line, cudaGetErrorString( err ) ); exit( -1 ); } #endif return; }
Using these error checking functions is easy:
使用這兩個錯誤檢查函數非常簡單:
CudaSafeCall( cudaMalloc( &fooPtr, fooSize ) ); fooKernel<<< x, y >>>(); // Kernel call CudaCheckError();
These functions are actually derived from similar functions which used to be available in the cutil.h
in old CUDA SDKs.
這兩個函數實際上也是從簡單的舊CUDA SDK里導出的