訓練 SSD-Tensorflow 遇到的若干問題

本文轉載自查看原文 2018-07-16 21:16 1737 Tensorflow

根據開源代碼SSD-Tensorflow，訓練數據的時候遇到若干問題，記錄如下。

遇到的第一個問題

這個bug 無關 SSD-Tensorflow 本身。

首先制作 tfrecords 格式的數據，使用教程上指令：

DATASET_DIR=./VOC2007/test/
OUTPUT_DIR=./tfrecords
python tf_convert_data.py \
    --dataset_name=pascalvoc \
    --dataset_dir=${DATASET_DIR} \
    --output_name=voc_2007_train \
    --output_dir=${OUTPUT_DIR}

按照教程上的，寫了一個 change.sh 腳本文件，然后運行sh change.sh。報錯如下：

matthew@DL:~/SSD-Tensorflow$ sh change.sh 
Traceback (most recent call last):
  File "tf_convert_data.py", line 59, in <module>
    tf.app.run()
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_convert_data.py", line 49, in main
    raise ValueError('You must supply the dataset directory with --dataset_dir')
ValueError: You must supply the dataset directory with --dataset_dir
change.sh: 4: change.sh: --dataset_name=pascalvoc: not found
: not found5: change.sh: --dataset_dir=./VOC2007/test/
change.sh: 6: change.sh: --output_name=voc_2007_train: not found
: not found7: change.sh: --output_dir=./tfrecords

這個不是腳本代碼本身的錯誤，而是因為操作系統緣故。我本地電腦的無GPU （窮啊唉~），用的是 windows 系統，然后將代碼上傳到服務器的 ubuntu 系統上執行。

windows 的默認換行是\n\r，而 linux 的換行是\n。linux 命令的續行符號\后面是不允許添加除換行以外符號的，空格都不允許。

所以上面的報錯主要原因是換行符號。解決方法如下：

sed -i 's/\r$//g'  change.sh

使用 sed 流編輯命令，將 change.sh 每個行末的\r替換成為空。

遇到的第二個問題

在完成數據文件 tfrecords 制作之后，按照指令運行 train_ssd_network.py。成功運行代碼幾秒后，報錯如下：

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, All bounding box coordinates must be in [0.0, 1.0]: 1.002
	 [[Node: ssd_preprocessing_train/distorted_bounding_box_crop/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.6, 1.67], max_attempts=200, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ssd_preprocessing_train/distorted_bounding_box_crop/Shape, ssd_preprocessing_train/distorted_bounding_box_crop/ExpandDims, ssd_preprocessing_train/distorted_bounding_box_crop/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2/min_object_covered)]]
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train_ssd_network.py", line 390, in <module>
    tf.app.run()
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_ssd_network.py", line 386, in main
    sync_optimizer=None)
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 775, in train
    sv.stop(threads, close_summary_writer=True)
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
    raise value
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1231, in _single_operation_run
    target_list_as_strings, status, None)
  File "/home/matthew/tensorflow_5/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: All bounding box coordinates must be in [0.0, 1.0]: 1.002
	 [[Node: ssd_preprocessing_train/distorted_bounding_box_crop/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.6, 1.67], max_attempts=200, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ssd_preprocessing_train/distorted_bounding_box_crop/Shape, ssd_preprocessing_train/distorted_bounding_box_crop/ExpandDims, ssd_preprocessing_train/distorted_bounding_box_crop/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2/min_object_covered)]]

問題的主要原因是：

數據集中的數據標記不規范，出現了bbox四個坐標值落到到圖像外的情況。

提供兩條解決思路：

寫腳本，判斷出有哪些圖片標記出錯了。然后，刪除這些標記不規范的圖片，重新制作數據集。
修改 tfrecords 制作過程。

首先，第一個方法是可行的，就是比較麻煩，要找到原來的圖片，刪除，重新制作成 tfrecord 格式。我寫了一個簡單的腳本，檢查之后發現，我使用的數據集（1W張圖片）中有200+圖片標記有問題，都是 xmax 或 ymax 超出了邊界幾個像素。

一些原因（嗯嗯嗯，主要是懶，2333），我不想再制作一遍數據集。決定找個相對優雅的方法解決這個問題。於是就有了第二個方案：

追蹤 tf_convert_data.py，發現主要使用 datasets/pascalvoc_to_tfrecords.py 執行數據格式轉化工作。找到114-119行：

bboxes.append((float(bbox.find('ymin').text) / shape[0],
                       float(bbox.find('xmin').text) / shape[1],
                       float(bbox.find('ymax').text) / shape[0],
                       float(bbox.find('xmax').text) / shape[1]
                       ))

修改為：

bboxes.append((max(float(bbox.find('ymin').text) / shape[0], 0.0),
                       max(float(bbox.find('xmin').text) / shape[1], 0.0),
                       min(float(bbox.find('ymax').text) / shape[0], 1.0),
                       min(float(bbox.find('xmax').text) / shape[1], 1.0)
                       ))

然后使用命令，重新將數據集制作成 tfrecords 格式，問題便解決了。

這樣做是合理的，理由如下：

標記員將坐標標記錯誤，跳出了圖像范圍，這時候識別目標應該就在圖像邊緣，所以選取圖像邊界作為目標的邊界是合理的。

主要參考鏈接

https://github.com/balancap/SSD-Tensorflow/issues/37

https://blog.csdn.net/lin_bingfeng/article/details/53750516

寫作不易，尊重原創，喜歡請點贊，評論，打賞~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 SSD-Tensorflow 512x512 訓練配置 SCons 的若干問題 Emacs flycheck插件配置中遇到的若干問題 ubuntu安裝使用vim-instant-markdown插件時遇到的若干問題關於《圖解國富論》若干問題的思考《三》關於《圖解國富論》若干問題的思考《四》 Unity模型導入的若干問題 TeXstudio 編寫Latex論文的若干問題 PostgreSQL(PostGIS)安裝和入門的若干問題 Tensorflow + MobileNetv2_SSD 訓練