Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory

Question

I am trying to train a network on Caffe. I have image size of 512x640. Batch size is 1. I'm trying to implement FCN-8s.

I am currently running this on a Amazon EC2 instance (g2.2xlarge) with 4GB of GPU memory. But when I run the solver, it immediately throws out an error

Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)

Can someone help me proceed from here?

Shai · Accepted Answer · 2015-11-19 08:17:58Z

up vote 3down voteaccepted

The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA).
Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost ofgradient accuracy), but since you are already at batch size = 1...
Are you sure batch size is 1 for both TRAIN and TEST phases?

edited Nov 19 '15 at 8:17

answered Nov 19 '15 at 6:00

Shai

40.8k1456127

I guessed so. And yes, both train and test phases' batch size is 1. I think I have resize the training images to something smaller and try it out. But why is 4GB of GPU Memory turning out to be less space? It says The total number of bytes read was 537399810 which is much smaller than 4GB. – Abhilash PanigrahiNov 19 '15 at 8:11

@AbhilashPanigrahi is it possible some other processes are using GPU at the same time? try command linenvidia-smi to see what's going on on your GPU. – Shai Nov 19 '15 at 8:18

I did. No other process is running apart from this (which automatically quits after a few seconds because of the error). – Abhilash Panigrahi Nov 19 '15 at 8:21

1

I just reduced the image and label size to about 256x320. It runs successfully. I saw it is using around 3.75 GB of GPU memory. Thanks for the help. – Abhilash Panigrahi Nov 19 '15 at 8:47

add a comment

Check failed: error == cudaSuccess (2 vs. 0) out of memory

Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory

Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory

1 Answer

免責聲明！