1、用TensorFlow訓練一個物體檢測器(手把手教學版) - 陳茂林的技術博客 - CSDN博客.html(https://blog.csdn.net/chenmaolin88/article/details/79357263)
ZC:看的是這個教程,按照步驟一步一步來,做到 訓練的 地方,家里的筆記本 出現了如下的報錯(公司的沒報錯...)
2、報錯現象為:(如下圖)

2.1、看這個報錯現象,怎么看 我也想不到是 找不到文件... 於是心里 就有了第一印象:文件有,肯定是 哪個操作 出現了錯誤。
2.2、第一眼 看到 _pre??? 還以為是線程什么的報錯... 后來仔細看 才發現是 io操作,然后網上 查資料。
資料很少,相關的倒是也有 基本現象一樣 但是解決方案是 降版本&將CPU的版本換成GPU的版本;還有看到一個外文網站上倒是說 可能是 文件名 / 路徑 寫錯了,我也查了 但是由於 第一印象不是找不到文件 就只是大概的對了一下,發現沒錯 就pass了這個可能性...
朝着 函數出錯的方向來找:根據上面的 報錯信息 一步一步的 debug跟到 函數_preread_check(...)中 發現它調用的是 pywrap_tensorflow.CreateBufferedInputStream(...) 再stepin是 _pywrap_tensorflow_internal.CreateBufferedInputStream(...) 再想往里就 進不去了。度搜“_pywrap_tensorflow_internal.CreateBufferedInputStream” 倒也有相關信息 但是它的問題是dll找不到,要下載 VC++相關版本的 可再發行組件包(Redistributable Package)(vc_redist???.exe)。
但是 我不知道 這個函數對應的 dll是哪個啊... 於是 沒辦法的情況下 瞎試,想到 我用的 tensorflow whell 是 vs2019編譯的,於是 下載了 vs2019的可再發行組件包,發現還是不行,然后 把機子里面所有的 可再發行組件包都刪掉,vs2017也刪掉(排除干擾,我當時的想法是:我怕是使用了 早期版本的DLL,而沒有使用vs2019的DLL。老版的dll可能是缺少了什么 才導致出錯的),反正能刪的都刪了,重新安裝 vs2019的 可再發行組件包,重啟OS,報錯依舊... 然后 又嘗試將所有相關的 可再發行組件包&vs 等軟件都刪掉的情況下,安裝 vs2019社區版,下載好 安裝后,報錯依舊... 此路暫時不通了...
上面不通了,函數出錯的另一個想法是 DLL應該都對了,那可能是 哪里出錯呢?肯定是 tensorflow調用的時候出錯了,但是 源碼跟不進去了,然而 在源碼注釋的地方 看到 源碼的編譯時通過 swig的,於是查了&下載了相關swig,想自己編譯tensorflow,看看 到底是使用了哪個WindowsDLL,到底是哪里出錯了。但是 查了 一下 看到有人說 比較麻煩,且我想 不一定成功,於是 此路 先放放...
現在 沒辦法了啊,又想到 降版本的方案:雖然我的筆記本不是N卡 無法用GPU的版本,但是 會不會 tensorflow版本不同 調用的WindowsAPI有所差異??關鍵是 貌似 重裝tensorflow在之前弄過,很方便,一點都不麻煩。於是 試試,將1.14.0的Py37.CPU(AVX2)版本 降到 1.13.0的Py37.CPU(AVX2)版本,然后再跑 Py程序,還是出錯(絕望?No!報錯信息 不同了!),如下圖:
C:\Python\Python37\python.exe G:/Tensorflow/models_copy/research/object_detection_zz/model_main_zz.py --logtostderr --pipeline_config_path=G:\Tensorflow_dataset\raccoon_dataset\ssd_mobilenet_v1_raccoon.config --train_dir=G:/Tensorflow_dataset/raccoon_dataset/train
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.
Traceback (most recent call last):
File "G:/Tensorflow/models_copy/research/object_detection_zz/model_main_zz.py", line 139, in <module>
tf.app.run()
File "C:\Python\Python37\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "G:/Tensorflow/models_copy/research/object_detection_zz/model_main_zz.py", line 71, in main
FLAGS.sample_1_of_n_eval_on_train_examples))
File "C:\Python\Python37\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\model_lib.py", line 605, in create_estimator_and_inputs
pipeline_config_path, config_override=config_override)
File "C:\Python\Python37\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\config_util.py", line 103, in get_configs_from_pipeline_file
proto_str = f.read()
File "C:\Python\Python37\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 125, in read
self._preread_check()
File "C:\Python\Python37\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "C:\Python\Python37\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: G:\Tensorflow_dataset\raccoon_dataset\ssd_mobilenet_v1_raccoon.config : ϵͳ\udcd5Ҳ\udcbb\udcb5\udcbdָ\udcb6\udca8\udcb5\udcc4\udcceļ\udcfe\udca1\udca3
; No such file or directory
Process finished with exit code 1
ZC: 可以看到,同樣是 報的函數_preread_check(...)出錯,但是報錯信息 已經 明確指出:“No such file or directory”。
ZC:此時才確認 真的是 文件找不到的緣故,此時再次核對 文件名&路徑,發現 命令行中用的是 ssd_mobilenet_v1_raccoon.config,而我文件夾里面的文件名是 ssd_mobilenet_v1_reccoon.config ...... 改了,就好了... ... (公司的文件名是raccoon 沒想到家里的是reccoon... 兩邊用的一樣的命令行... )(原本還想,如果還是核對不出來的話,就再看看 有沒有大小寫區分,用代碼 枚舉所有文件夾&文件名 用以檢查 里面有沒有 Explorer中看不見的怪異字符)
PS:中間 還有人說到 是 protobuf踩到了不該踩的虛擬內存地址 才報的錯,於是 把原來安裝的 protobuf 卸載了,又用"pip install protobuf" 裝了一遍,沒用
3、
4、
5、測試時使用的 "G:\Tensorflow\models_copy\research\object_detection_zz\model_main_zz.py"(文件夾"object_detection_zz" 是直接復制的 文件夾"object_detection",然后改名的;文件"model_main.py" 也是直接復制的 文件"model_main_zz.py",然后改名的)
5.1、得到 文件夾"object_detection_zz" 后,直接使用 PyCharm 打開 文件夾"object_detection_zz",然后 用 PyCharm來 run/debug model_main_zz.py 。
5.2、model_main_zz.py的內容:
# Copyright 2017 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Binary to run train and evaluation on object detection model.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function from absl import flags import tensorflow as tf from object_detection import model_hparams from object_detection import model_lib flags.DEFINE_string( 'model_dir', None, 'Path to output model directory ' 'where event and checkpoint files will be written.') flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config ' 'file.') flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.') flags.DEFINE_boolean('eval_training_data', False, 'If training data should be evaluated for this job. Note ' 'that one call only use this in eval-only mode, and ' '`checkpoint_dir` must be supplied.') flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of ' 'every n eval input examples, where n is provided.') flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample ' 'one of every n train input examples for evaluation, ' 'where n is provided. This is only used if ' '`eval_training_data` is True.') flags.DEFINE_string( 'hparams_overrides', None, 'Hyperparameter overrides, ' 'represented as a string containing comma-separated ' 'hparam_name=value pairs.') flags.DEFINE_string( 'checkpoint_dir', None, 'Path to directory holding a checkpoint. If ' '`checkpoint_dir` is provided, this binary operates in eval-only mode, ' 'writing resulting metrics to `model_dir`.') flags.DEFINE_boolean( 'run_once', False, 'If running in eval-only mode, whether to run just ' 'one round of eval vs running continuously (default).' ) FLAGS = flags.FLAGS def main(unused_argv): flags.mark_flag_as_required('model_dir') flags.mark_flag_as_required('pipeline_config_path') config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir) train_and_eval_dict = model_lib.create_estimator_and_inputs( run_config=config, hparams=model_hparams.create_hparams(FLAGS.hparams_overrides), pipeline_config_path=FLAGS.pipeline_config_path, train_steps=FLAGS.num_train_steps, sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples, sample_1_of_n_eval_on_train_examples=( FLAGS.sample_1_of_n_eval_on_train_examples)) estimator = train_and_eval_dict['estimator'] train_input_fn = train_and_eval_dict['train_input_fn'] eval_input_fns = train_and_eval_dict['eval_input_fns'] eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn'] predict_input_fn = train_and_eval_dict['predict_input_fn'] train_steps = train_and_eval_dict['train_steps'] with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print("estimator :\t", estimator) print("train_input_fn :\t", train_input_fn) print("eval_input_fns :\t", eval_input_fns) print("eval_on_train_input_fn :\t", eval_on_train_input_fn) print("predict_input_fn :\t", predict_input_fn) print("train_steps :\t", train_steps) print() print("FLAGS.model_dir :\t", FLAGS.model_dir) print("FLAGS.pipeline_config_path :\t", FLAGS.pipeline_config_path) print("FLAGS.num_train_steps :\t", FLAGS.num_train_steps) print("FLAGS.eval_training_data :\t", FLAGS.eval_training_data) print("FLAGS.sample_1_of_n_eval_examples :\t", FLAGS.sample_1_of_n_eval_examples) print("FLAGS.sample_1_of_n_eval_on_train_examples :\t", FLAGS.sample_1_of_n_eval_on_train_examples) print("FLAGS.hparams_overrides :\t", FLAGS.hparams_overrides) print("FLAGS.checkpoint_dir :\t", FLAGS.checkpoint_dir) print("FLAGS.run_once :\t", FLAGS.run_once) # print("FLAGS. :\t", FLAGS.) print() print("FLAGS.model_dir :\t", FLAGS.model_dir) print("FLAGS.hparams_overrides :\t", FLAGS.hparams_overrides) print("FLAGS.pipeline_config_path :\t", FLAGS.pipeline_config_path) print("FLAGS.num_train_steps :\t", FLAGS.num_train_steps) print("FLAGS.sample_1_of_n_eval_examples :\t", FLAGS.sample_1_of_n_eval_examples) print("FLAGS.sample_1_of_n_eval_on_train_examples :\t", FLAGS.sample_1_of_n_eval_on_train_examples) # if FLAGS.checkpoint_dir: # if FLAGS.eval_training_data: # name = 'training_data' # input_fn = eval_on_train_input_fn # else: # name = 'validation_data' # # The first eval input will be evaluated. # input_fn = eval_input_fns[0] # if FLAGS.run_once: # estimator.evaluate(input_fn, # steps=None, # checkpoint_path=tf.train.latest_checkpoint( # FLAGS.checkpoint_dir)) # else: # model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn, # train_steps, name) # else: # train_spec, eval_specs = model_lib.create_train_and_eval_specs( # train_input_fn, # eval_input_fns, # eval_on_train_input_fn, # predict_input_fn, # train_steps, # eval_on_train_data=False) # # # Currently only a single Eval Spec is allowed. # tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) if __name__ == '__main__': tf.app.run()
6、
7、
8、
9、
