近日,在使用Cascade R-CNN完成目標檢測任務時,我在使用這個模型訓練自己的數據集時出現了如下錯誤:
具體如以下截圖所示:
詳細錯誤如下所示:
Traceback (most recent call last): File "train.py", line 195, in <module> train() File "train.py", line 175, in train _, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: attempt to get argmax of an empty sequence Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__ ret = func(*args) File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 49, in anchor_target_layer argmax_overlaps = overlaps.argmax(axis=1) ValueError: attempt to get argmax of an empty sequence [[node sample_anchors_minibatch/PyFunc (defined at ../libs/networks/build_whole_network.py:433) = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Cast/_1175, postprocess_RPN/Shape_2, make_anchors_forRPN/concat/_1177)]] [[{{node sample_RCNN_minibatch_stage2/Shape_1/_1383}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3164_sample_RCNN_minibatch_stage2/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] Caused by op 'sample_anchors_minibatch/PyFunc', defined at: File "train.py", line 195, in <module> train() File "train.py", line 46, in train gtboxes_batch=gtboxes_and_label) File "../libs/networks/build_whole_network.py", line 433, in build_whole_detection_network [tf.float32, tf.float32]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 457, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 281, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 129, in py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): ValueError: attempt to get argmax of an empty sequence Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__ ret = func(*args) File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 49, in anchor_target_layer argmax_overlaps = overlaps.argmax(axis=1) ValueError: attempt to get argmax of an empty sequence [[node sample_anchors_minibatch/PyFunc (defined at ../libs/networks/build_whole_network.py:433) = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Cast/_1175, postprocess_RPN/Shape_2, make_anchors_forRPN/concat/_1177)]] [[{{node sample_RCNN_minibatch_stage2/Shape_1/_1383}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3164_sample_RCNN_minibatch_stage2/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
我使用的教程是這個鏈接:cascade r-cnn訓練和測試(tensorflow框架)
這個錯誤在以前也遇到過,當時的解決方案是通過try except 把發生錯誤的數據給pass掉。
然后在這次訓練的過程中,又遇到了這個錯誤,這次的錯誤已經沒有辦法給pass掉了,因為這個錯誤會直接導致程序運行中斷。
錯誤原因:空標注文件導致出現這個錯誤,在檢查自己的標注文件過程中,偶然發現竟然存在如下所示的標注文件
<annotation> <folder>********</folder> <filename>**********</filename> <path>******************</path> <source> <database>Unknown</database> </source> <size> <width>219</width> <height>167</height> <depth>3</depth> </size> <segmented>0</segmented> </annotation>
在這個標注的xml文件里面是沒有目標檢測框的坐標的,而這也是導致出現這個錯誤的主要原因。
錯誤解決:
有可能在制作數據集的過程中,某些地方導致xml文件里面的坐標丟失,解決辦法有兩種,一種是刪除掉空坐標的xml文件如果這種類型的xml文件數量較少的情況下,第二種就是檢查xml文件然后把丟失的坐標點給添加到xml文件中去。
總結:
這個錯誤的解決方案也不一定和我一樣,這里的提出只是當作一種參考,可能導致錯誤的原因多種多樣,但是如果后面還是出現了這種錯誤,一定要仔細檢查一下數據集。如果后面出現了新的解決方案,我會更新這篇博客的。