tflite量化

本文轉載自查看原文 2020-03-31 20:41 1618 tensorflow筆記/ 量化

tensorflow graphdefs to TensorFlow Lite's flat buffer format
tf、tflite存儲格式不同,數據精度不同.

量化

量化好處自不必說了,減小模型大小,減少內存占用,提升速度,以及某些架構的硬件只支持int8,這時候必須量化.缺點就是模型精度有所下降.

量化也分很多種

不同的硬件設備所要求的量化方式是不同的,Edge TPU上只支持后兩種.后兩種都是要求有訓練數據的.

從上圖看,post-training的量化方式對推理速度的提升很有限.

Post-training quantization

Post-training quantization也分很多種.

Edge TPU只能使用Full integer quantization.

Float16 quantization of weights

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]
tflite_quant_model = converter.convert()

沒啥說的,很少用.還不如直接fp32省心了,無非是減小點模型大小.

Dynamic range quantization

只能用於CPU加速

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()

只把weights從fp32-->int8. 推理的時候,把int8轉回fp32,用floating-point kernels做計算.

注意:However, the outputs are still stored using floating point, so that the speedup with dynamic-range ops is less than a full fixed-point computation

量化得到的模型只有weights是int8的,activation的部分依然是fp32的,activation輸出的量化在推理的時候做,一個graph的不同部分用不同的kernel去處理.

Full integer quantization of weights and activations

you need to measure the dynamic range of activations and inputs by supplying a representative data set
要提供representative data set

模型的weights和activation全部轉換為int8. 在cpu上速度提高3-4倍. 支持EDGE TPU上運行.

量化后的模型的input和output仍然是fp32的,對於activation op沒有量化版本的實現的,會保留為fp32。

import tensorflow as tf

def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()

對於沒有量化版本實現的op就不量化了,保留為fp32.
對於Edge TPU這種只支持int8 op的硬件,必須模型里的op都是int8的.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

通過指定converter屬性要求必須全部op都轉為int8的,否則直接報錯.

tflite官方支持的模型倉庫

int8<-->fp32

int8和fp32之間的映射關系如何確定,參考下面官方鏈接
https://www.tensorflow.org/lite/performance/quantization_spec

值得一看

https://zhuanlan.zhihu.com/p/99424468
https://blog.csdn.net/qq_19784349/article/details/82883271
https://murphypei.github.io/blog/2019/11/neural-network-quantization

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 模型量化原理及tflite示例 TFLite基礎知識 tflite模型的生成 c++調用tflite實戰 ckpt,pb,tflite使用和轉換 Tensorflow Lite tflite模型的生成與導入 tensorflow lite 之生成 tflite 模型文件 [TF] TensorFlow & TFLite - C++ API 采樣和量化【MTK工具】【TFLITE】借助MTK工具將TensorFlow模型pb文件轉化為TFLITE