Check failed: error == cudaSuccess (2 vs. 0) out of memory


Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory

I am trying to train a network on Caffe. I have image size of 512x640. Batch size is 1. I'm trying to implement FCN-8s.

I am currently running this on a Amazon EC2 instance (g2.2xlarge) with 4GB of GPU memory. But when I run the solver, it immediately throws out an error

Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)

Can someone help me proceed from here?

share improve this question
 

1 Answer

up vote 3down voteaccepted

The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA).
Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost ofgradient accuracy), but since you are already at batch size = 1...
Are you sure batch size is 1 for both TRAIN and TEST phases?

share improve this answer
 
    
I guessed so. And yes, both train and test phases' batch size is 1. I think I have resize the training images to something smaller and try it out. But why is 4GB of GPU Memory turning out to be less space? It says The total number of bytes read was 537399810 which is much smaller than 4GB. – Abhilash PanigrahiNov 19 '15 at 8:11 
    
@AbhilashPanigrahi is it possible some other processes are using GPU at the same time? try command linenvidia-smi to see what's going on on your GPU. – Shai Nov 19 '15 at 8:18
    
I did. No other process is running apart from this (which automatically quits after a few seconds because of the error). – Abhilash Panigrahi Nov 19 '15 at 8:21
1  
I just reduced the image and label size to about 256x320. It runs successfully. I saw it is using around 3.75 GB of GPU memory. Thanks for the help. – Abhilash Panigrahi Nov 19 '15 at 8:47


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM