1. 介绍
TensorFlow 主要有两个关于模型量化压缩的包,一个是 tensorflow/tensorflow/lite,另一个是 tensorflow/tensorflow/contrib/quantize。之前两个包都在 contrib 包下,最近更新 lite 包才被移出到主目录,目前 lite 的版本应该时比较正式了。
2. 区别
按照 tensorflow/tensorflow/lite/tutorials/post_training_quant.ipynb 中所描述(文中 quantization aware training 即指 quantize 包):
In contrast to quantization aware training , the weights are quantized post training and the activations are quantized dynamically at inference in this method. Therefore, the model weights are not retrained to compensate for quantization induced errors. It is important to check the accuracy of the quantized model to ensure that the degradation is acceptable.
两者的区别是:
- lite: 在训练完成后量化,不能对量化后的模型进行微调,需要考虑精度下降的程度能否接受;
- quantize: 可以在量化后再进行微调。
3. quantize 相关
quantize包相关的文档比较少,仅有在 API 页面中的 4 个函数的说明(Defined in quantize_graph.py):
create_eval_graph(input_graph=None)
为了模拟量化就地重绘 eval input_graph。create_training_graph(input_graph=None, quant_delay=0)
为了模拟量化就地重绘 training input_graph。此函数需要在向 graph 中插入梯度操作之前调用。对于已经过训练的模型,推荐 quant_delay 取默认值;对于重头开始训练的模型,quant_delay 需要设置为模型迭代至收敛的步数,量化会在这一步开始,并对模型进行微调,若不提供 quant_delay,训练很可能会失败。experimental_create_eval_graph(input_graph=None, weight_bits=8, activation_bits=8, quant_delay=None, scope=None)
暂时不知道 experimental 和上面的有什么区别...以后再补experimental_create_training_graph(input_graph=None, weight_bits=8, activation_bits=8, quant_delay=0, freeze_bn_delay=None, scope=None)
4. 待补充
该来的还是会来的。
RTFS.