Check failed: error == cudaSuccess (2 vs. 0) out of memory


Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory

I am trying to train a network on Caffe. I have image size of 512x640. Batch size is 1. I'm trying to implement FCN-8s.

I am currently running this on a Amazon EC2 instance (g2.2xlarge) with 4GB of GPU memory. But when I run the solver, it immediately throws out an error

Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)

Can someone help me proceed from here?

share improve this question
 

1 Answer

up vote 3down voteaccepted

The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA).
Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost ofgradient accuracy), but since you are already at batch size = 1...
Are you sure batch size is 1 for both TRAIN and TEST phases?

share improve this answer
 
    
I guessed so. And yes, both train and test phases' batch size is 1. I think I have resize the training images to something smaller and try it out. But why is 4GB of GPU Memory turning out to be less space? It says The total number of bytes read was 537399810 which is much smaller than 4GB. – Abhilash PanigrahiNov 19 '15 at 8:11 
    
@AbhilashPanigrahi is it possible some other processes are using GPU at the same time? try command linenvidia-smi to see what's going on on your GPU. – Shai Nov 19 '15 at 8:18
    
I did. No other process is running apart from this (which automatically quits after a few seconds because of the error). – Abhilash Panigrahi Nov 19 '15 at 8:21
1  
I just reduced the image and label size to about 256x320. It runs successfully. I saw it is using around 3.75 GB of GPU memory. Thanks for the help. – Abhilash Panigrahi Nov 19 '15 at 8:47


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM