使用python接口, 另外一種方式就是使用tf-trt,優化后的模型還是pb。優化的過程主要是一些層的合並啊之類的,加速結果不是特別明顯,測了兩個網絡,
加速了10%的樣子。優化后仍是pb,因此可以繼續用tfserving。
keras/tf model -> pb model ->(trt優化model)
或者已經是savedmodel,可直接通 saved_model_cli來轉換,用於后續的tfserving
參考:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example
https://github.com/srihari-humbarwadi/TensorRT-for-keras
https://github.com/jeng1220/KerasToTensorRT
https://github.com/NVIDIA-AI-IOT/tf_trt_models
https://github.com/WeJay/TensorRTkeras
https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/image-classification
https://github.com/srihari-humbarwadi/TensorRT-for-keras
https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/examples/classification/classification.ipynb
https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models-with-tensorflow-serving-on-wml-ce/
討論區
https://devtalk.nvidia.com/default/board/304/tensorrt/
其他還有C++端的接口,暫是沒用到
https://zhuanlan.zhihu.com/p/85365075
https://zhuanlan.zhihu.com/p/86827710
http://manaai.cn/aicodes_detail3.html?id=48