https://github.com/fizyr/keras-retinanet 根据此网站的方法,利用Pascal VOC 2007数据集开始训练,出现error:
D:\JupyterWorkSpace\keras-retinanet>python D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py pascal D:\\JupyterWorkSpace\\VOCdevkit\\VOC2007 --steps 100 C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. Traceback (most recent call last): File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 35, in <module> from .. import layers # noqa: F401 File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\__init__.py", line 1, in <module> from ._misc import RegressBoxes, UpsampleLike, Anchors, ClipBoxes # noqa: F401 File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\_misc.py", line 19, in <module> from ..utils import anchors as utils_anchors File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\utils\anchors.py", line 20, in <module> from ..utils.compute_overlap import compute_overlap ModuleNotFoundError: No module named 'keras_retinanet.utils.compute_overlap'
在anchors.py中的from ..utils.compute_overlap import compute_overlap之前加入
import pyximport pyximport.install()
再运行,出现如下error:
D:\JupyterWorkSpace\keras-retinanet>python D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py pascal D:\\JupyterWorkSpace\\VOCdevkit\\VOC2007 --steps 100 C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. compute_overlap.c C:\Users\Administrator\.pyxbld\temp.win-amd64-3.6\Release\pyrex\keras_retinanet\utils\compute_overlap.c(567): fatal error C1083: Cannot open include file: 'numpy/arrayobject.h': No such file or directory Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 423, in compile self.spawn(args) File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 542, in spawn return super().spawn(cmd) File "C:\ProgramData\Anaconda3\lib\distutils\ccompiler.py", line 909, in spawn spawn(cmd, dry_run=self.dry_run) File "C:\ProgramData\Anaconda3\lib\distutils\spawn.py", line 38, in spawn _spawn_nt(cmd, search_path, dry_run=dry_run) File "C:\ProgramData\Anaconda3\lib\distutils\spawn.py", line 81, in _spawn_nt "command %r failed with exit status %d" % (cmd, rc)) distutils.errors.DistutilsExecError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 215, in load_module inplace=build_inplace, language_level=language_level) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 191, in build_module reload_support=pyxargs.reload_support) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyxbuild.py", line 102, in pyx_to_dll dist.run_commands() File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_commands self.run_command(cmd) File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_command cmd_obj.run() File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run _build_ext.build_ext.run(self) File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 339, in run self.build_extensions() File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions self.build_extension(ext) File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 533, in build_extension depends=ext.depends) File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 425, in compile raise CompileError(msg) distutils.errors.CompileError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 35, in <module> from .. import layers # noqa: F401 File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\__init__.py", line 1, in <module> from ._misc import RegressBoxes, UpsampleLike, Anchors, ClipBoxes # noqa: F401 File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\_misc.py", line 19, in <module> from ..utils import anchors as utils_anchors File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\utils\anchors.py", line 22, in <module> from ..utils.compute_overlap import compute_overlap File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 458, in load_module language_level=self.language_level) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 231, in load_module raise exc.with_traceback(tb) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 215, in load_module inplace=build_inplace, language_level=language_level) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 191, in build_module reload_support=pyxargs.reload_support) File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyxbuild.py", line 102, in pyx_to_dll dist.run_commands() File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_commands self.run_command(cmd) File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_command cmd_obj.run() File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run _build_ext.build_ext.run(self) File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 339, in run self.build_extensions() File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions self.build_extension(ext) File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 533, in build_extension depends=ext.depends) File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 425, in compile raise CompileError(msg) ImportError: Building module keras_retinanet.utils.compute_overlap failed: ["distutils.errors.CompileError: command 'C:\\\\Program Files (x86)\\\\Microsoft Visual Studio 14.0\\\\VC\\\\BIN\\\\x86_amd64\\\\cl.exe' failed with exit status 2\n"]
我猜测python调用c在Windows系统上bug比较多,还好这个Keras RetinaNet github项目的旧版本没有调用c,索性就用旧版本。
但是又出现问题:
Limit: 1551050342 InUse: 1548747008 MaxInUse: 1549328640 NumAllocs: 1403 MaxAllocSize: 119565824 2018-07-31 13:42:42.065436: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:277] **************************************************************************************************** 2018-07-31 13:42:42.081028: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1,512,100,101] Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn status, run_metadata) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,512,100,101] [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]] [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 443, in <module> main() File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 438, in main callbacks=callbacks, File "C:\ProgramData\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1415, in fit_generator initial_epoch=initial_epoch) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 213, in fit_generator class_weight=class_weight) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1215, in train_on_batch outputs = self.train_function(ins) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2672, in __call__ return self._legacy_call(inputs) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2654, in _legacy_call **self.session_kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 889, in run run_metadata_ptr) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run options, run_metadata) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,512,100,101] [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]] [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Caused by op 'bn3a_branch2c/FusedBatchNorm', defined at: File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 443, in <module> main() File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 410, in main freeze_backbone=args.freeze_backbone File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 87, in create_models model = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights, skip_mismatch=True) File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\models\resnet.py", line 33, in retinanet return resnet_retinanet(*args, backbone=self.backbone, **kwargs) File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\models\resnet.py", line 75, in resnet_retinanet resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True) File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\models\_2d.py", line 188, in ResNet50 return ResNet(inputs, blocks, numerical_names=numerical_names, block=keras_resnet.blocks.bottleneck_2d, include_top=include_top, classes=classes, *args, **kwargs) File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\models\_2d.py", line 76, in ResNet x = block(features, stage_id, block_id, numerical_name=(block_id > 0 and numerical_names[stage_id]), freeze_bn=freeze_bn)(x) File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\blocks\_2d.py", line 139, in f y = keras_resnet.layers.BatchNormalization(axis=axis, epsilon=1e-5, freeze=freeze_bn, name="bn{}{}_branch2c".format(stage_char, block_char))(y) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 457, in __call__ output = self.call(inputs, **kwargs) File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\layers\_batch_normalization.py", line 17, in call return super(BatchNormalization, self).call(training=(not self.freeze), *args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\normalization.py", line 178, in call return normalize_inference() File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\normalization.py", line 174, in normalize_inference epsilon=self.epsilon) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 1905, in batch_normalization is_training=False File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 831, in fused_batch_norm name=name) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 2033, in _fused_batch_norm is_training=is_training, name=name) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op op_def=op_def) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,512,100,101] [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]] [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
cmd上显示的电脑条件如下:
2018-07-31 13:41:32.294261: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX 2018-07-31 13:41:32.924335: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: name: GeForce GT 740 major: 3 minor: 0 memoryClockRate(GHz): 1.0585 pciBusID: 0000:01:00.0 totalMemory: 2.00GiB freeMemory: 1.66GiB 2018-07-31 13:41:32.931511: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GT 740, pci bus id: 0000:01:00.0, compute capability: 3.0)
出现错误的原因应该是GPU内存太小,所以还需要换个更好的GPU。