Paddle預訓練模型應用工具PaddleHub
- 本文主要介紹如何使用飛槳預訓練模型管理工具PaddleHub,快速體驗模型以及實現遷移學習。建議使用GPU環境運行相關程序,可以在啟動環境時,如下圖所示選擇“高級版”環境即可。

如果沒有算力卡資源可以點擊鏈接申請。

概述
首先提個問題,請問十行Python代碼能干什么?有人說可以做個小日歷、做個應答機器人等等,用十行代碼可以成功訓練出深度學習模型,飛槳的PaddleHub可以輕松實現。
PaddleHub是飛槳生態下的預訓練模型的管理工具,旨在讓飛槳生態下的開發者更便捷地享受到大規模預訓練模型的價值。用戶可以通過PaddleHub便捷地獲取飛槳生態下的預訓練模型,結合Fine-tune API快速完成遷移學習到應用部署的全流程工作,讓預訓練模型能更好服務於用戶特定場景的應用。
當前PaddleHub已經可以支持文本、圖像、視頻、語音和工業應用等五大類主流方向,為用戶准備了大量高質量的預訓練模型,可以滿足用戶各種應用場景的任務需求,包括但不限於詞法分析、情感分析、圖像分類、圖像分割、目標檢測、關鍵點檢測、視頻分類等經典任務。同時結合時事熱點,如圖1所示,PaddleHub作為飛槳最活躍的生態組成之一,也會及時開源類似口罩人臉檢測及分類、肺炎CT影像分析等實用場景模型,幫助開發者快速開發使用。

圖1 肺炎CT影像與口罩人臉檢測及分類效果圖
通常情況下,如果用戶希望使用模型完成推理業務,需要完成訓練數據采集標注、算法開發、模型訓練、預測部署等任務,這其中任何一項都需要花費較多的人力和成本,為了解決這個問題,飛槳提供了PaddleHub預訓練模型管理工具。用戶可以直接使用PaddleHub中的預訓練模型,或以遷移學習的方式訓練出自己想要的模型,快速實現推理業務。
那什么是遷移學習呢?通俗的來講,遷移學習就是運用已有的知識來學習新的知識,例如學會了騎自行車的人也能較快的學會騎電動車。較為常用的一種遷移學習方式是利用預訓練模型進行微調,即用戶基於當前任務的場景從PaddleHub中選擇已訓練成功的模型進行新任務訓練,且該模型曾經使用的數據集與新場景的數據集情況相近,此時僅需要在當前任務場景的訓練過程中使用新場景的數據對模型參數進行微調,即可完成訓練任務。
總之,PaddleHub幫助用戶簡化了數據采集、算法開發、模型訓練、預測部署等流程,實現開箱即用,且僅需要增加高質量的領域數據,即可快速提升模型效果。
PaddleHub主要包括如下三類功能:
- 使用命令行實現快速推理:PaddleHub基於“模型即軟件”的設計理念,通過Python API或命令行實現快速預測,更方便地使用飛槳模型庫。
- 使用預訓練模型進行遷移學習:選擇高質量預訓練模型結合Fine-tune API,在短時間內完成模型訓練。
- PaddleHub Serving一鍵服務化部署:使用簡單命令行搭建屬於自己的模型的API服務。
前置條件
在使用PaddleHub之前,用戶需要完成如下任務:
- 安裝Python:對於Linux或MAC操作系統請安裝3.5或3.5以上版本;對於Windows系統,請安裝3.6或3.6以上版本。
- 安裝飛槳2.0版本,具體安裝方法請參見快速安裝。
- 安裝PaddleHub 2.0或以上版本。
!pip install paddlehub==2.0.0rc
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Collecting paddlehub==2.0.0rc
Downloading https://mirror.baidu.com/pypi/packages/df/7f/47008ee77d31f317616112c5a817222caa089fd0760807296775ab811910/paddlehub-2.0.0rc0-py3-none-any.whl (190kB)
|████████████████████████████████| 194kB 11.2MB/s eta 0:00:01
Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (4.1.0)
Collecting easydict (from paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/4c/c5/5757886c4f538c1b3f95f6745499a24bffa389a805dee92d093e2d9ba7db/easydict-1.9.tar.gz
Collecting gitpython (from paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/d7/cb/ec98155c501b68dcb11314c7992cd3df6dce193fd763084338a117967d53/GitPython-3.1.12-py3-none-any.whl (159kB)
|████████████████████████████████| 163kB 73.8MB/s eta 0:00:01
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (2.2.3)
Requirement already satisfied: pyzmq in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (18.0.1)
Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (3.1)
Requirement already satisfied: visualdl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (2.1.0)
Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (0.4.4)
Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (1.16.4)
Collecting packaging (from paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/b1/a7/588bfa063e7763247ab6f7e1d994e331b85e0e7d09f853c59a6eb9696974/packaging-20.8-py2.py3-none-any.whl
Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (7.1.2)
Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (5.1.2)
Collecting paddlenlp>=2.0.0b2 (from paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/14/26/492612b0cb40bcc12c2a4fb8f7248b4939abd87dcfe1537b003ebbe02f6e/paddlenlp-2.0.0b3-py3-none-any.whl (163kB)
|████████████████████████████████| 163kB 23.1MB/s eta 0:00:01
Collecting filelock (from paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/93/83/71a2ee6158bb9f39a90c0dea1637f81d5eef866e188e1971a1b1ab01a35a/filelock-3.0.12-py3-none-any.whl
Requirement already satisfied: flask>=1.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (1.1.1)
Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (4.1.1.26)
Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (4.36.1)
Requirement already satisfied: gunicorn>=19.10.0; sys_platform != "win32" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.0.0rc) (20.0.4)
Collecting gitdb<5,>=4.0.1 (from gitpython->paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/48/11/d1800bca0a3bae820b84b7d813ad1eff15a48a64caea9c823fc8c1b119e8/gitdb-4.0.5-py3-none-any.whl (63kB)
|████████████████████████████████| 71kB 7.4MB/s eta 0:00:011
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (2.4.2)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (0.10.0)
Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (2019.3)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (2.8.0)
Requirement already satisfied: six>=1.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.0.0rc) (1.15.0)
Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (1.0.0)
Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (3.12.2)
Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (2.22.0)
Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (1.21.0)
Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (0.8.53)
Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.0.0rc) (3.8.2)
Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0b2->paddlehub==2.0.0rc) (0.42.1)
Requirement already satisfied: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0b2->paddlehub==2.0.0rc) (2.9.0)
Collecting seqeval (from paddlenlp>=2.0.0b2->paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB)
|████████████████████████████████| 51kB 16.3MB/s eta 0:00:01
Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.0.0rc) (7.0)
Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.0.0rc) (2.10.1)
Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.0.0rc) (0.16.0)
Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.0.0rc) (1.1.0)
Requirement already satisfied: setuptools>=3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gunicorn>=19.10.0; sys_platform != "win32"->paddlehub==2.0.0rc) (41.4.0)
Collecting smmap<4,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython->paddlehub==2.0.0rc)
Downloading https://mirror.baidu.com/pypi/packages/b0/9a/4d409a6234eb940e6a78dfdfc66156e7522262f5f2fecca07dc55915952d/smmap-3.0.4-py2.py3-none-any.whl
Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0->paddlehub==2.0.0rc) (2.8.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddlehub==2.0.0rc) (1.25.6)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddlehub==2.0.0rc) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddlehub==2.0.0rc) (2019.9.11)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddlehub==2.0.0rc) (2.8)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (0.23)
Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (0.10.0)
Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (1.3.4)
Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (16.7.9)
Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (1.3.0)
Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (1.4.10)
Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (2.0.1)
Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddlehub==2.0.0rc) (3.9.9)
Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddlehub==2.0.0rc) (0.18.0)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.0.0rc) (0.6.1)
Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.0.0rc) (2.2.0)
Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.0.0rc) (2.6.0)
Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp>=2.0.0b2->paddlehub==2.0.0rc) (0.22.1)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.0->paddlehub==2.0.0rc) (1.1.1)
Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (0.6.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0b2->paddlehub==2.0.0rc) (0.14.1)
Requirement already satisfied: scipy>=0.17.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0b2->paddlehub==2.0.0rc) (1.3.0)
Requirement already satisfied: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->pre-commit->visualdl>=2.0.0->paddlehub==2.0.0rc) (7.2.0)
Building wheels for collected packages: easydict, seqeval
Building wheel for easydict (setup.py) ... done
Created wheel for easydict: filename=easydict-1.9-cp37-none-any.whl size=6350 sha256=2e5071bd4b99471b6dc7a8e30d4c36cec3435569ff5ca2f29e127ac16a5ccb7e
Stored in directory: /home/aistudio/.cache/pip/wheels/35/8b/38/7327c27cd3d4590ffa75b98030bd3828e68b8bb3d599573163
Building wheel for seqeval (setup.py) ... done
Created wheel for seqeval: filename=seqeval-1.2.2-cp37-none-any.whl size=16171 sha256=f31bd140696d29d09ed926bdbd565947421e2f17b035c905f06a15d2a463d992
Stored in directory: /home/aistudio/.cache/pip/wheels/9c/f7/1c/8bdbcbb74a93c95d32f55c63f51e6dbf20b77b7c1db4164f14
Successfully built easydict seqeval
Installing collected packages: easydict, smmap, gitdb, gitpython, packaging, seqeval, paddlenlp, filelock, paddlehub
Found existing installation: paddlehub 1.6.0
Uninstalling paddlehub-1.6.0:
Successfully uninstalled paddlehub-1.6.0
Successfully installed easydict-1.9 filelock-3.0.12 gitdb-4.0.5 gitpython-3.1.12 packaging-20.8 paddlehub-2.0.0rc0 paddlenlp-2.0.0b3 seqeval-1.2.2 smmap-3.0.4
說明:
使用PaddleHub下載數據集、預訓練模型等,要求機器可以訪問外網。可以使用server_check()檢查本地與遠端PaddleHub-Server的連接狀態,使用方法如下。 如果可以連接遠端PaddleHub-Server,則顯示“Request Hub-Server successfully”。否則顯示“Request Hub-Server unsuccessfully”。
import paddlehub
paddlehub.server_check()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
[2021-01-14 18:04:26,869] [ INFO] - Request Hub-Server successfully.
True
預訓練模型
PaddleHub支持的預訓練模型涵蓋了圖像分類、關鍵點檢測、目標檢測、文字識別、圖像生成、人臉檢測、圖像編輯、圖像分割、視頻分類、視頻修復、詞法分析、語義模型、情感分析、文本審核、文本生成、語音合成、工業質檢等200多個主流模型。
進入官網,用戶可以點擊首頁上“學習模型”部分的“所有模型 ”鏈接查看PaddleHub支持的所有預訓練模型。如圖2所示,頁面的左側導航欄中可以看到模型類型,且在每個類型內用戶可以看到按照不同網絡結構、不同預訓練數據集等信息划分的近二百個預訓練模型。在導航欄右側,可以看到對應類型支持的預訓練模型簡要信息,這些信息以頁簽的方式呈現,包括模型名稱、使用場景類別(圖像、文本、視頻、語音、工業應用)、網絡類型、預訓練使用的數據集和簡介等內容。如果用戶希望查看某個預訓練模型的具體信息,則可以點擊對應頁簽進行查看。

圖2 所有模型頁面
用戶在選定預訓練模型后,請按照官網上預訓練模型的詳細信息中“選擇模型版本進行安裝”的內容安裝預訓練模型。以lac模型為例其對應的安裝命令為:
! hub install lac
You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/lac_2.2.0.tar.gz
[##################################################] 100.00%
Decompress /home/aistudio/.paddlehub/tmp/tmpzb35zm2v/lac_2.2.0.tar.gz
[##################################################] 100.00%
[2021-01-14 18:04:42,668] [ INFO] - Successfully installed lac-2.2.0
使用命令行實現快速推理
為了能讓用戶快速體驗飛槳的模型推理效果,PaddleHub支持了使用命令行實現快速推理的功能。例如用戶可以執行如下命令使用詞法分析模型LAC(Lexical Analysis of Chinese)實現分詞功能。
說明: LAC是一個聯合的詞法分析模型,能整體性地完成中文分詞、詞性標注、專名識別任務。
!hub run lac --input_text "現在,慕尼黑再保險公司不僅是此類行動的倡議者,更是將其大量氣候數據整合進保險產品中,並與公眾共享大量天氣信息,參與到新能源領域的保障中。"
You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
[2021-01-14 18:04:56,552] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0114 18:04:56.589459 519 analysis_predictor.cc:1058] Deprecated. Please use CreatePredictor instead.
[{'word': ['現在', ',', '慕尼黑再保險公司', '不僅', '是', '此類', '行動', '的', '倡議者', ',', '更是', '將', '其', '大量', '氣候', '數據', '整合', '進', '保險', '產品', '中', ',', '並', '與', '公眾', '共享', '大量', '天氣', '信息', ',', '參與', '到', '新能源', '領域', '的', '保障', '中', '。'], 'tag': ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']}]
實現快速推理的命令行的格式如下所示,其中參數解釋如下:
- module-name:模型名稱。
- input-parameter:輸入參數,即上面例子中的“–input_text”
- input-value:推理的輸入值,即上面例子中的“今天是個好日子”。
不同的模型,命令行格式和參數取值也不同,具體信息請在每個模型中查看“命令行預測示例”部分。
hub run ${module-name} ${input-parameter} ${input-value}
當前PaddleHub中僅有部分預訓練模型支持使用命令行實現快速推理功能,具體一個模型是否支持該功能,用戶可以通過官網介紹中是否含有命令行預測及服務部署介紹獲得。

圖3 預測模型示例
使用預訓練模型進行遷移學習
通過高質量預訓練模型與PaddleHub Fine-tune API,使用戶只需要少量代碼即可實現自然語言處理和計算機視覺場景的深度學習模型。以文本分類為例,共分4個步驟:
1. 選擇並加載預訓練模型
本例使用ERNIE Tiny模型來演示如何利用PaddleHub實現finetune。ERNIE Tiny主要通過模型結構壓縮和模型蒸餾的方法,將 ERNIE 2.0 Base 模型進行壓縮。相較於 ERNIE 2.0,ERNIE Tiny模型能帶來4.3倍的預測提速,具有更高的工業落地能力。
!hub install ernie_tiny==2.0.1
You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/ernie_tiny_2.0.1.tar.gz
[##################################################] 100.00%
Decompress /home/aistudio/.paddlehub/tmp/tmp9cxbz2jk/ernie_tiny_2.0.1.tar.gz
[##################################################] 100.00%
[2021-01-14 18:05:14,353] [ INFO] - Successfully installed ernie_tiny-2.0.1
import paddlehub as hub
model = hub.Module(name='ernie_tiny', version='2.0.1', task='seq-cls', num_classes=2)
[2021-01-14 18:05:21,013] [ INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams and saved to /home/aistudio/.paddlenlp/models/ernie-tiny
[2021-01-14 18:05:21,015] [ INFO] - Downloading ernie_tiny.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/ernie_tiny.pdparams
100%|██████████| 354158/354158 [00:08<00:00, 43591.73it/s]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1245: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1245: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
其中,參數:
- name:模型名稱,可以選擇ernie,ernie_tiny,bert-base-cased, bert-base-chinese, roberta-wwm-ext,roberta-wwm-ext-large等。
- version:module版本號
- task:fine-tune任務。此處為seq-cls,表示文本分類任務。
- num_classes:表示當前文本分類任務的類別數,根據具體使用的數據集確定,默認為2。
PaddleHub還提供BERT等模型可供選擇, 當前支持文本分類任務的模型對應的加載示例如下:
| 模型名 |
PaddleHub Module |
| ERNIE, Chinese |
hub.Module(name='ernie') |
| ERNIE tiny, Chinese |
hub.Module(name='ernie_tiny') |
| ERNIE 2.0 Base, English |
hub.Module(name='ernie_v2_eng_base') |
| ERNIE 2.0 Large, English |
hub.Module(name='ernie_v2_eng_large') |
| BERT-Base, English Cased |
hub.Module(name='bert-base-cased') |
| BERT-Base, English Uncased |
hub.Module(name='bert-base-uncased') |
| BERT-Large, English Cased |
hub.Module(name='bert-large-cased') |
| BERT-Large, English Uncased |
hub.Module(name='bert-large-uncased') |
| BERT-Base, Multilingual Cased |
hub.Module(nane='bert-base-multilingual-cased') |
| BERT-Base, Multilingual Uncased |
hub.Module(nane='bert-base-multilingual-uncased') |
| BERT-Base, Chinese |
hub.Module(name='bert-base-chinese') |
| BERT-wwm, Chinese |
hub.Module(name='chinese-bert-wwm') |
| BERT-wwm-ext, Chinese |
hub.Module(name='chinese-bert-wwm-ext') |
| RoBERTa-wwm-ext, Chinese |
hub.Module(name='roberta-wwm-ext') |
| RoBERTa-wwm-ext-large, Chinese |
hub.Module(name='roberta-wwm-ext-large') |
| RBT3, Chinese |
hub.Module(name='rbt3') |
| RBTL3, Chinese |
hub.Module(name='rbtl3') |
| ELECTRA-Small, English |
hub.Module(name='electra-small') |
| ELECTRA-Base, English |
hub.Module(name='electra-base') |
| ELECTRA-Large, English |
hub.Module(name='electra-large') |
| ELECTRA-Base, Chinese |
hub.Module(name='chinese-electra-base') |
| ELECTRA-Small, Chinese |
hub.Module(name='chinese-electra-small') |
通過以上的一行代碼,model初始化為一個適用於文本分類任務的模型,為ERNIE Tiny的預訓練模型后拼接上一個全連接網絡(Full Connected)。

以上圖片來自於:https://arxiv.org/pdf/1810.04805.pdf
2. 准備數據集並讀取數據
用戶可以選擇使用自定義的數據集或PaddleHub提供的數據集進行遷移訓練。
(1) PaddleHub提供的數據集ChnSentiCorp
# 自動從網絡下載數據集並解壓到用戶目錄下$HUB_HOME/.paddlehub/dataset目錄
train_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='train')
dev_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='dev')
[2021-01-14 18:06:07,529] [ INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/vocab.txt
100%|██████████| 459/459 [00:00<00:00, 6793.01it/s]
[2021-01-14 18:06:07,889] [ INFO] - Downloading spm_cased_simp_sampled.model from https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/spm_cased_simp_sampled.model
100%|██████████| 1083/1083 [00:00<00:00, 8108.15it/s]
[2021-01-14 18:06:08,252] [ INFO] - Downloading dict.wordseg.pickle from https://paddlenlp.bj.bcebos.com/models/transformers/ernie_tiny/dict.wordseg.pickle
100%|██████████| 161822/161822 [00:04<00:00, 39625.95it/s]
Download https://bj.bcebos.com/paddlehub-dataset/chnsenticorp.tar.gz
[##################################################] 100.00%
Decompress /home/aistudio/.paddlehub/tmp/tmp09k65v5a/chnsenticorp.tar.gz
[##################################################] 100.00%
[2021-01-14 18:06:23,215] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/vocab.txt
[2021-01-14 18:06:23,222] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/spm_cased_simp_sampled.model
[2021-01-14 18:06:23,225] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/dict.wordseg.pickle
- tokenizer:表示該module所需用到的tokenizer,其將對輸入文本完成切詞,並轉化成module運行所需模型輸入格式。
- mode:選擇數據模式,可選項有 train, test, val, 默認為train。
- max_seq_len:ERNIE/BERT模型使用的最大序列長度,若出現顯存不足,請適當調低這一參數。
預訓練模型ERNIE對中文數據的處理是以字為單位,tokenizer作用為將原始輸入文本轉化成模型model可以接受的輸入數據形式。 PaddleHub 2.0中的各種預訓練模型已經內置了相應的tokenizer,可以通過model.get_tokenizer方法獲取。


(2) 自定義數據集
如果用戶希望使用自定義的數據集,則需要對自定義數據進行相應的預處理,將數據集文件處理成預訓練模型可以讀取的格式。例如用PaddleHub文本分類任務使用自定義數據時,需要切分數據集,將數據集切分為訓練集、驗證集和測試集。
a. 設置數據集目錄。
用戶需要將數據集目錄設定為如下格式。
├──data: 數據目錄
├── train.txt: 訓練集數據
├── dev.txt: 驗證集數據
└── test.txt: 測試集數據
b. 設置文件格式和內容。
訓練集、驗證集和測試集文件的編碼格式建議為utf8格式。內容的第一列是文本內容,第二列為文本類別標簽。列與列之間以Tab鍵分隔。建議在數據集文件第一行填寫列說明"label"和"text_a",中間以Tab鍵分隔,示例如下:
label text_a
房產 昌平京基鷺府10月29日推別墅1200萬套起享97折
教育 貴州2011高考錄取分數線發布理科一本448分
社會 眾多白領因集體戶口面臨結婚難題
...
c. 加載自定義數據集。
加載文本分類的自定義數據集,用戶僅需要繼承基類TextClassificationDataset,修改數據集存放地址以及類別即可,具體可以參考如下代碼:
from paddlehub.datasets.base_nlp_dataset import TextClassificationDataset
class SeqClsDataset(TextClassificationDataset):
# 數據集存放目錄
base_path = '/path/to/dataset'
# 數據集的標簽列表
label_list=['體育', '科技', '社會', '娛樂', '股票', '房產', '教育', '時政', '財經', '星座', '游戲', '家居', '彩票', '時尚']
def __init__(self, tokenizer, max_seq_len: int = 128, mode: str = 'train'):
if mode == 'train':
data_file = 'train.txt'
elif mode == 'test':
data_file = 'test.txt'
else:
data_file = 'dev.txt'
super().__init__(
base_path=self.base_path,
tokenizer=tokenizer,
max_seq_len=max_seq_len,
mode=mode,
data_file=data_file,
label_list=self.label_list,
is_file_with_header=True)
# 選擇所需要的模型,獲取對應的tokenizer
import paddlehub as hub
model = model = hub.Module(name='ernie_tiny', task='seq-cls', num_classes=len(SeqClsDataset.label_list))
tokenizer = model.get_tokenizer()
# 實例化訓練集
train_dataset = SeqClsDataset(tokenizer)
至此用戶可以通過SeqClsDataset實例化獲取對應的數據集,可以通過hub.Trainer對預訓練模型model完成文本分類任務,詳情可參考PaddleHub文本分類demo。
說明:
CV類預訓練模型的自定義數據集的設置方法請參考PaddleHub適配自定義數據完成finetune。
3. 選擇優化策略和運行配置
運行如下代碼,即可實現對文本分類模型的finetune:
import paddle
optimizer = paddle.optimizer.Adam(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_text_cls', use_gpu=True)
trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset, save_interval=1)
[2021-01-14 18:06:45,223] [ WARNING] - PaddleHub model checkpoint not found, start from scratch...
[2021-01-14 18:06:46,358] [ TRAIN] - Epoch=1/3, Step=10/300 loss=0.6446 acc=0.6375 lr=0.000050 step/sec=8.96 | ETA 00:01:40
[2021-01-14 18:06:47,307] [ TRAIN] - Epoch=1/3, Step=20/300 loss=0.4035 acc=0.8688 lr=0.000050 step/sec=10.54 | ETA 00:01:32
[2021-01-14 18:06:48,258] [ TRAIN] - Epoch=1/3, Step=30/300 loss=0.2783 acc=0.8812 lr=0.000050 step/sec=10.51 | ETA 00:01:30
[2021-01-14 18:06:49,210] [ TRAIN] - Epoch=1/3, Step=40/300 loss=0.2588 acc=0.9000 lr=0.000050 step/sec=10.50 | ETA 00:01:29
[2021-01-14 18:06:50,158] [ TRAIN] - Epoch=1/3, Step=50/300 loss=0.2476 acc=0.9062 lr=0.000050 step/sec=10.55 | ETA 00:01:28
[2021-01-14 18:06:51,105] [ TRAIN] - Epoch=1/3, Step=60/300 loss=0.2832 acc=0.9062 lr=0.000050 step/sec=10.56 | ETA 00:01:27
[2021-01-14 18:06:52,051] [ TRAIN] - Epoch=1/3, Step=70/300 loss=0.2453 acc=0.9031 lr=0.000050 step/sec=10.58 | ETA 00:01:27
[2021-01-14 18:06:53,000] [ TRAIN] - Epoch=1/3, Step=80/300 loss=0.3446 acc=0.8781 lr=0.000050 step/sec=10.53 | ETA 00:01:27
[2021-01-14 18:06:53,946] [ TRAIN] - Epoch=1/3, Step=90/300 loss=0.2419 acc=0.9094 lr=0.000050 step/sec=10.56 | ETA 00:01:27
[2021-01-14 18:06:54,897] [ TRAIN] - Epoch=1/3, Step=100/300 loss=0.2760 acc=0.8938 lr=0.000050 step/sec=10.52 | ETA 00:01:26
[2021-01-14 18:06:55,846] [ TRAIN] - Epoch=1/3, Step=110/300 loss=0.2552 acc=0.9031 lr=0.000050 step/sec=10.54 | ETA 00:01:26
[2021-01-14 18:06:56,795] [ TRAIN] - Epoch=1/3, Step=120/300 loss=0.2802 acc=0.8844 lr=0.000050 step/sec=10.54 | ETA 00:01:26
[2021-01-14 18:06:57,746] [ TRAIN] - Epoch=1/3, Step=130/300 loss=0.2462 acc=0.9031 lr=0.000050 step/sec=10.51 | ETA 00:01:26
[2021-01-14 18:06:58,698] [ TRAIN] - Epoch=1/3, Step=140/300 loss=0.2153 acc=0.9094 lr=0.000050 step/sec=10.50 | ETA 00:01:26
[2021-01-14 18:06:59,651] [ TRAIN] - Epoch=1/3, Step=150/300 loss=0.2140 acc=0.9187 lr=0.000050 step/sec=10.49 | ETA 00:01:26
[2021-01-14 18:07:00,611] [ TRAIN] - Epoch=1/3, Step=160/300 loss=0.2318 acc=0.9250 lr=0.000050 step/sec=10.42 | ETA 00:01:26
[2021-01-14 18:07:01,563] [ TRAIN] - Epoch=1/3, Step=170/300 loss=0.2424 acc=0.8969 lr=0.000050 step/sec=10.51 | ETA 00:01:26
[2021-01-14 18:07:02,515] [ TRAIN] - Epoch=1/3, Step=180/300 loss=0.1933 acc=0.9250 lr=0.000050 step/sec=10.50 | ETA 00:01:26
[2021-01-14 18:07:03,468] [ TRAIN] - Epoch=1/3, Step=190/300 loss=0.2376 acc=0.9156 lr=0.000050 step/sec=10.50 | ETA 00:01:26
[2021-01-14 18:07:04,415] [ TRAIN] - Epoch=1/3, Step=200/300 loss=0.2600 acc=0.8938 lr=0.000050 step/sec=10.56 | ETA 00:01:26
[2021-01-14 18:07:05,372] [ TRAIN] - Epoch=1/3, Step=210/300 loss=0.1915 acc=0.9219 lr=0.000050 step/sec=10.45 | ETA 00:01:26
[2021-01-14 18:07:06,328] [ TRAIN] - Epoch=1/3, Step=220/300 loss=0.2076 acc=0.9313 lr=0.000050 step/sec=10.46 | ETA 00:01:26
[2021-01-14 18:07:07,276] [ TRAIN] - Epoch=1/3, Step=230/300 loss=0.1849 acc=0.9281 lr=0.000050 step/sec=10.55 | ETA 00:01:26
[2021-01-14 18:07:08,230] [ TRAIN] - Epoch=1/3, Step=240/300 loss=0.2051 acc=0.9219 lr=0.000050 step/sec=10.48 | ETA 00:01:26
[2021-01-14 18:07:09,178] [ TRAIN] - Epoch=1/3, Step=250/300 loss=0.2602 acc=0.9125 lr=0.000050 step/sec=10.55 | ETA 00:01:26
[2021-01-14 18:07:10,127] [ TRAIN] - Epoch=1/3, Step=260/300 loss=0.1979 acc=0.9281 lr=0.000050 step/sec=10.54 | ETA 00:01:26
[2021-01-14 18:07:11,087] [ TRAIN] - Epoch=1/3, Step=270/300 loss=0.1809 acc=0.9406 lr=0.000050 step/sec=10.41 | ETA 00:01:26
[2021-01-14 18:07:12,041] [ TRAIN] - Epoch=1/3, Step=280/300 loss=0.2120 acc=0.9125 lr=0.000050 step/sec=10.49 | ETA 00:01:26
[2021-01-14 18:07:12,997] [ TRAIN] - Epoch=1/3, Step=290/300 loss=0.1672 acc=0.9313 lr=0.000050 step/sec=10.45 | ETA 00:01:26
[2021-01-14 18:07:13,941] [ TRAIN] - Epoch=1/3, Step=300/300 loss=0.2095 acc=0.9187 lr=0.000050 step/sec=10.60 | ETA 00:01:26
[2021-01-14 18:07:15,169] [ EVAL] - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - [Evaluation result] avg_acc=0.9292
[2021-01-14 18:07:27,287] [ EVAL] - Saving best model to test_ernie_text_cls/best_model [best acc=0.9292]
[2021-01-14 18:07:27,289] [ INFO] - Saving model checkpoint to test_ernie_text_cls/epoch_1
[2021-01-14 18:07:40,309] [ TRAIN] - Epoch=2/3, Step=10/300 loss=0.1009 acc=0.9719 lr=0.000050 step/sec=0.38 | ETA 00:02:39
[2021-01-14 18:07:41,258] [ TRAIN] - Epoch=2/3, Step=20/300 loss=0.1035 acc=0.9656 lr=0.000050 step/sec=10.54 | ETA 00:02:37
[2021-01-14 18:07:42,203] [ TRAIN] - Epoch=2/3, Step=30/300 loss=0.0717 acc=0.9781 lr=0.000050 step/sec=10.58 | ETA 00:02:35
[2021-01-14 18:07:43,164] [ TRAIN] - Epoch=2/3, Step=40/300 loss=0.1062 acc=0.9625 lr=0.000050 step/sec=10.41 | ETA 00:02:33
[2021-01-14 18:07:44,123] [ TRAIN] - Epoch=2/3, Step=50/300 loss=0.0798 acc=0.9688 lr=0.000050 step/sec=10.43 | ETA 00:02:31
[2021-01-14 18:07:45,080] [ TRAIN] - Epoch=2/3, Step=60/300 loss=0.0684 acc=0.9750 lr=0.000050 step/sec=10.46 | ETA 00:02:29
[2021-01-14 18:07:46,030] [ TRAIN] - Epoch=2/3, Step=70/300 loss=0.1395 acc=0.9563 lr=0.000050 step/sec=10.52 | ETA 00:02:27
[2021-01-14 18:07:46,978] [ TRAIN] - Epoch=2/3, Step=80/300 loss=0.0953 acc=0.9750 lr=0.000050 step/sec=10.55 | ETA 00:02:26
[2021-01-14 18:07:47,928] [ TRAIN] - Epoch=2/3, Step=90/300 loss=0.1744 acc=0.9469 lr=0.000050 step/sec=10.53 | ETA 00:02:24
[2021-01-14 18:07:48,878] [ TRAIN] - Epoch=2/3, Step=100/300 loss=0.1134 acc=0.9563 lr=0.000050 step/sec=10.53 | ETA 00:02:23
[2021-01-14 18:07:49,824] [ TRAIN] - Epoch=2/3, Step=110/300 loss=0.1100 acc=0.9719 lr=0.000050 step/sec=10.57 | ETA 00:02:21
[2021-01-14 18:07:50,774] [ TRAIN] - Epoch=2/3, Step=120/300 loss=0.1317 acc=0.9594 lr=0.000050 step/sec=10.53 | ETA 00:02:20
[2021-01-14 18:07:51,728] [ TRAIN] - Epoch=2/3, Step=130/300 loss=0.1149 acc=0.9594 lr=0.000050 step/sec=10.48 | ETA 00:02:19
[2021-01-14 18:07:52,678] [ TRAIN] - Epoch=2/3, Step=140/300 loss=0.1106 acc=0.9594 lr=0.000050 step/sec=10.53 | ETA 00:02:17
[2021-01-14 18:07:53,629] [ TRAIN] - Epoch=2/3, Step=150/300 loss=0.1503 acc=0.9437 lr=0.000050 step/sec=10.51 | ETA 00:02:16
[2021-01-14 18:07:54,590] [ TRAIN] - Epoch=2/3, Step=160/300 loss=0.1165 acc=0.9688 lr=0.000050 step/sec=10.40 | ETA 00:02:15
[2021-01-14 18:07:55,547] [ TRAIN] - Epoch=2/3, Step=170/300 loss=0.1219 acc=0.9531 lr=0.000050 step/sec=10.46 | ETA 00:02:14
[2021-01-14 18:07:56,506] [ TRAIN] - Epoch=2/3, Step=180/300 loss=0.0948 acc=0.9688 lr=0.000050 step/sec=10.43 | ETA 00:02:13
[2021-01-14 18:07:57,468] [ TRAIN] - Epoch=2/3, Step=190/300 loss=0.1614 acc=0.9313 lr=0.000050 step/sec=10.40 | ETA 00:02:12
[2021-01-14 18:07:58,429] [ TRAIN] - Epoch=2/3, Step=200/300 loss=0.1075 acc=0.9594 lr=0.000050 step/sec=10.40 | ETA 00:02:11
[2021-01-14 18:07:59,395] [ TRAIN] - Epoch=2/3, Step=210/300 loss=0.0625 acc=0.9781 lr=0.000050 step/sec=10.35 | ETA 00:02:10
[2021-01-14 18:08:00,359] [ TRAIN] - Epoch=2/3, Step=220/300 loss=0.1832 acc=0.9375 lr=0.000050 step/sec=10.37 | ETA 00:02:10
[2021-01-14 18:08:01,325] [ TRAIN] - Epoch=2/3, Step=230/300 loss=0.0925 acc=0.9531 lr=0.000050 step/sec=10.35 | ETA 00:02:09
[2021-01-14 18:08:02,285] [ TRAIN] - Epoch=2/3, Step=240/300 loss=0.1071 acc=0.9594 lr=0.000050 step/sec=10.42 | ETA 00:02:08
[2021-01-14 18:08:03,244] [ TRAIN] - Epoch=2/3, Step=250/300 loss=0.1390 acc=0.9500 lr=0.000050 step/sec=10.42 | ETA 00:02:07
[2021-01-14 18:08:04,203] [ TRAIN] - Epoch=2/3, Step=260/300 loss=0.1107 acc=0.9688 lr=0.000050 step/sec=10.43 | ETA 00:02:06
[2021-01-14 18:08:05,169] [ TRAIN] - Epoch=2/3, Step=270/300 loss=0.1033 acc=0.9563 lr=0.000050 step/sec=10.36 | ETA 00:02:06
[2021-01-14 18:08:06,134] [ TRAIN] - Epoch=2/3, Step=280/300 loss=0.2035 acc=0.9406 lr=0.000050 step/sec=10.36 | ETA 00:02:05
[2021-01-14 18:08:07,093] [ TRAIN] - Epoch=2/3, Step=290/300 loss=0.1285 acc=0.9469 lr=0.000050 step/sec=10.43 | ETA 00:02:04
[2021-01-14 18:08:08,048] [ TRAIN] - Epoch=2/3, Step=300/300 loss=0.1037 acc=0.9688 lr=0.000050 step/sec=10.47 | ETA 00:02:04
[2021-01-14 18:08:09,299] [ EVAL] - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - [Evaluation result] avg_acc=0.9400
[2021-01-14 18:08:31,268] [ EVAL] - Saving best model to test_ernie_text_cls/best_model [best acc=0.9400]
[2021-01-14 18:08:31,271] [ INFO] - Saving model checkpoint to test_ernie_text_cls/epoch_2
[2021-01-14 18:08:44,266] [ TRAIN] - Epoch=3/3, Step=10/300 loss=0.0417 acc=0.9844 lr=0.000050 step/sec=0.28 | ETA 00:02:55
[2021-01-14 18:08:45,224] [ TRAIN] - Epoch=3/3, Step=20/300 loss=0.0459 acc=0.9844 lr=0.000050 step/sec=10.44 | ETA 00:02:54
[2021-01-14 18:08:46,190] [ TRAIN] - Epoch=3/3, Step=30/300 loss=0.0663 acc=0.9750 lr=0.000050 step/sec=10.35 | ETA 00:02:52
[2021-01-14 18:08:47,144] [ TRAIN] - Epoch=3/3, Step=40/300 loss=0.0633 acc=0.9750 lr=0.000050 step/sec=10.48 | ETA 00:02:51
[2021-01-14 18:08:48,095] [ TRAIN] - Epoch=3/3, Step=50/300 loss=0.0283 acc=0.9969 lr=0.000050 step/sec=10.52 | ETA 00:02:50
[2021-01-14 18:08:49,055] [ TRAIN] - Epoch=3/3, Step=60/300 loss=0.0390 acc=0.9781 lr=0.000050 step/sec=10.42 | ETA 00:02:48
[2021-01-14 18:08:50,009] [ TRAIN] - Epoch=3/3, Step=70/300 loss=0.0752 acc=0.9750 lr=0.000050 step/sec=10.48 | ETA 00:02:47
[2021-01-14 18:08:50,959] [ TRAIN] - Epoch=3/3, Step=80/300 loss=0.0303 acc=0.9844 lr=0.000050 step/sec=10.53 | ETA 00:02:46
[2021-01-14 18:08:51,912] [ TRAIN] - Epoch=3/3, Step=90/300 loss=0.0703 acc=0.9688 lr=0.000050 step/sec=10.49 | ETA 00:02:45
[2021-01-14 18:08:52,866] [ TRAIN] - Epoch=3/3, Step=100/300 loss=0.0521 acc=0.9906 lr=0.000050 step/sec=10.48 | ETA 00:02:44
[2021-01-14 18:08:53,818] [ TRAIN] - Epoch=3/3, Step=110/300 loss=0.0278 acc=0.9875 lr=0.000050 step/sec=10.50 | ETA 00:02:42
[2021-01-14 18:08:54,771] [ TRAIN] - Epoch=3/3, Step=120/300 loss=0.0539 acc=0.9875 lr=0.000050 step/sec=10.50 | ETA 00:02:41
[2021-01-14 18:08:55,735] [ TRAIN] - Epoch=3/3, Step=130/300 loss=0.0273 acc=0.9844 lr=0.000050 step/sec=10.37 | ETA 00:02:40
[2021-01-14 18:08:56,710] [ TRAIN] - Epoch=3/3, Step=140/300 loss=0.0463 acc=0.9812 lr=0.000050 step/sec=10.26 | ETA 00:02:39
[2021-01-14 18:08:57,673] [ TRAIN] - Epoch=3/3, Step=150/300 loss=0.0636 acc=0.9812 lr=0.000050 step/sec=10.38 | ETA 00:02:38
[2021-01-14 18:08:58,651] [ TRAIN] - Epoch=3/3, Step=160/300 loss=0.0455 acc=0.9812 lr=0.000050 step/sec=10.23 | ETA 00:02:37
[2021-01-14 18:08:59,619] [ TRAIN] - Epoch=3/3, Step=170/300 loss=0.0745 acc=0.9812 lr=0.000050 step/sec=10.33 | ETA 00:02:37
[2021-01-14 18:09:00,581] [ TRAIN] - Epoch=3/3, Step=180/300 loss=0.0619 acc=0.9906 lr=0.000050 step/sec=10.39 | ETA 00:02:36
[2021-01-14 18:09:01,541] [ TRAIN] - Epoch=3/3, Step=190/300 loss=0.0867 acc=0.9750 lr=0.000050 step/sec=10.42 | ETA 00:02:35
[2021-01-14 18:09:02,496] [ TRAIN] - Epoch=3/3, Step=200/300 loss=0.0570 acc=0.9781 lr=0.000050 step/sec=10.47 | ETA 00:02:34
[2021-01-14 18:09:03,454] [ TRAIN] - Epoch=3/3, Step=210/300 loss=0.0582 acc=0.9781 lr=0.000050 step/sec=10.44 | ETA 00:02:33
[2021-01-14 18:09:04,405] [ TRAIN] - Epoch=3/3, Step=220/300 loss=0.0804 acc=0.9719 lr=0.000050 step/sec=10.51 | ETA 00:02:32
[2021-01-14 18:09:05,361] [ TRAIN] - Epoch=3/3, Step=230/300 loss=0.0390 acc=0.9906 lr=0.000050 step/sec=10.46 | ETA 00:02:31
[2021-01-14 18:09:06,316] [ TRAIN] - Epoch=3/3, Step=240/300 loss=0.0314 acc=0.9875 lr=0.000050 step/sec=10.47 | ETA 00:02:31
[2021-01-14 18:09:07,272] [ TRAIN] - Epoch=3/3, Step=250/300 loss=0.0564 acc=0.9812 lr=0.000050 step/sec=10.46 | ETA 00:02:30
[2021-01-14 18:09:08,228] [ TRAIN] - Epoch=3/3, Step=260/300 loss=0.0294 acc=0.9938 lr=0.000050 step/sec=10.47 | ETA 00:02:29
[2021-01-14 18:09:09,187] [ TRAIN] - Epoch=3/3, Step=270/300 loss=0.0260 acc=0.9938 lr=0.000050 step/sec=10.42 | ETA 00:02:28
[2021-01-14 18:09:10,148] [ TRAIN] - Epoch=3/3, Step=280/300 loss=0.0523 acc=0.9812 lr=0.000050 step/sec=10.41 | ETA 00:02:28
[2021-01-14 18:09:11,112] [ TRAIN] - Epoch=3/3, Step=290/300 loss=0.1009 acc=0.9688 lr=0.000050 step/sec=10.37 | ETA 00:02:27
[2021-01-14 18:09:12,072] [ TRAIN] - Epoch=3/3, Step=300/300 loss=0.0494 acc=0.9844 lr=0.000050 step/sec=10.42 | ETA 00:02:26
[2021-01-14 18:09:13,319] [ EVAL] - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - Evaluation on validation dataset: \ - Evaluation on validation dataset: | - Evaluation on validation dataset: / - Evaluation on validation dataset: - - [Evaluation result] avg_acc=0.9458
[2021-01-14 18:09:35,225] [ EVAL] - Saving best model to test_ernie_text_cls/best_model [best acc=0.9458]
[2021-01-14 18:09:35,229] [ INFO] - Saving model checkpoint to test_ernie_text_cls/epoch_3
優化策略
Paddle2.0-rc提供了多種優化器選擇,如SGD, Adam, Adamax等,詳細參見策略。
其中Adam:
- learning_rate: 全局學習率。默認為1e-3;
- parameters: 待優化模型參數。
運行配置
Trainer 主要控制Fine-tune的訓練,包含以下可控制的參數:
- model: 被優化模型;
- optimizer: 優化器選擇;
- use_gpu: 是否使用gpu;
- use_vdl: 是否使用vdl可視化訓練過程;
- checkpoint_dir: 保存模型參數的地址;
- compare_metrics: 保存最優模型的衡量指標;
trainer.train 主要控制具體的訓練過程,包含以下可控制的參數:
- train_dataset: 訓練時所用的數據集;
- epochs: 訓練輪數;
- batch_size: 訓練的批大小,如果使用GPU,請根據實際情況調整batch_size;
- num_workers: works的數量,默認為0;
- eval_dataset: 驗證集;
- log_interval: 打印日志的間隔, 單位為執行批訓練的次數。
- save_interval: 保存模型的間隔頻次,單位為執行訓練的輪數。
4. 模型預測
當完成Fine-tune后,Fine-tune過程在驗證集上表現最優的模型會被保存在${CHECKPOINT_DIR}/best_model目錄下,其中${CHECKPOINT_DIR}目錄為Fine-tune時所選擇的保存checkpoint的目錄。
以以下數據為待預測數據,使用該模型來進行預測:
這個賓館比較陳舊了,特價的房間也很一般。總體來說一般
懷着十分激動的心情放映,可是看着看着發現,在放映完畢后,出現一集米老鼠的動畫片
作為老的四星酒店,房間依然很整潔,相當不錯。機場接機服務很好,可以在車上辦理入住手續,節省時間。
import paddlehub as hub
data = [
['這個賓館比較陳舊了,特價的房間也很一般。總體來說一般'],
['懷着十分激動的心情放映,可是看着看着發現,在放映完畢后,出現一集米老鼠的動畫片'],
['作為老的四星酒店,房間依然很整潔,相當不錯。機場接機服務很好,可以在車上辦理入住手續,節省時間。'],
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='ernie_tiny',
version='2.0.1',
task='seq-cls',
load_checkpoint='./test_ernie_text_cls/best_model/model.pdparams',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Lable: {}'.format(text[0], results[idx]))
[2021-01-14 18:10:49,270] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-tiny/ernie_tiny.pdparams
[2021-01-14 18:10:54,747] [ INFO] - Loaded parameters from /home/aistudio/test_ernie_text_cls/best_model/model.pdparams
[2021-01-14 18:10:54,801] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/vocab.txt
[2021-01-14 18:10:54,804] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/spm_cased_simp_sampled.model
[2021-01-14 18:10:54,807] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-tiny/dict.wordseg.pickle
Data: 這個賓館比較陳舊了,特價的房間也很一般。總體來說一般 Lable: negative
Data: 懷着十分激動的心情放映,可是看着看着發現,在放映完畢后,出現一集米老鼠的動畫片 Lable: negative
Data: 作為老的四星酒店,房間依然很整潔,相當不錯。機場接機服務很好,可以在車上辦理入住手續,節省時間。 Lable: positive
PaddleHub中不同模型的遷移訓練方法請參考:
此外PaddleHub在AI Studio上針對常用的熱門模型提供了在線體驗環境,歡迎用戶使用:
| 預訓練模型 |
任務類型 |
數據集 |
AIStudio鏈接 |
| resnet50_vd_imagenet_ssld |
圖像分類 |
花朵數據集Flowers |
|
| msgnet |
風格遷移 |
MiniCOCO數據集 |
|
| user_guided_colorization |
圖像着色 |
油畫數據集Canvas |
|
| ernie_tiny |
文本分類 |
情感分析數據集ChnSentiCorp |
- |
| ernie_tiny |
序列標注 |
序列標注數據集MSRA_NER |
- |
PaddleHub Serving一鍵服務化部署
使用PaddleHub能夠快速進行模型預測,但開發者常面臨本地預測過程遷移線上的需求。無論是對外開放服務端口,還是在局域網中搭建預測服務,都需要PaddleHub具有快速部署模型預測服務的能力。在這個背景下,模型一鍵服務部署工具——PaddleHub Serving應運而生。開發者通過一行命令即可快速啟動一個模型預測在線服務,而無需關注網絡框架選擇和實現。
PaddleHub Serving是基於PaddleHub的一鍵模型服務部署工具,能夠通過簡單的Hub命令行工具輕松啟動一個模型預測在線服務,前端通過Flask和Gunicorn完成網絡請求的處理,后端直接調用PaddleHub預測接口,同時支持使用多進程方式利用多核提高並發能力,保證預測服務的性能。
1. 支持模型
目前PaddleHub Serving支持對PaddleHub所有可直接預測的模型進行服務部署,包括lac、senta_bilstm等NLP類模型,以及yolov3_darknet53_coco2017、vgg16_imagenet等CV類模型,更多模型請參見PaddleHub支持模型列表。未來還將支持開發者使用PaddleHub Fine-tune API得到的模型用於快捷服務部署。
2. 部署方法
使用PaddleHub Serving部署預訓練模型的方法如下:
(1) 啟動服務端部署
PaddleHub Serving有兩種啟動方式,分別是使用命令行啟動,以及使用配置文件啟動。
a. 命令行命令啟動
啟動命令:
hub serving start --modules Module1==Version1 Module2==Version2 ... \
--port XXXX \
--use_gpu \
--use_multiprocess \
--workers \
--gpu \
參數:
| 參數 |
用途 |
| –modules/-m |
PaddleHub Serving預安裝模型,以多個Module==Version鍵值對的形式列出 |
| –port/-p |
服務端口,默認為8866 |
| –use_gpu |
使用GPU進行預測,必須安裝paddlepaddle-gpu |
| –use_multiprocess |
是否啟用並發方式,默認為單進程方式,推薦多核CPU機器使用此方式 |
| –workers |
在並發方式下指定的並發任務數,默認為2*cpu_count-1,其中cpu_count為CPU核數 |
| –gpu |
指定使用gpu的卡號,如1,2代表使用1號顯卡和2號顯卡,默認僅使用0號顯卡 |
NOTE: --use_gpu不可與–use_multiprocess共用。
b. 配置文件啟動
啟動命令:
hub serving start --config config.json
其中config.json格式如下:
{
"modules_info": {
"yolov3_darknet53_coco2017": {
"init_args": {
"version": "1.0.0"
},
"predict_args": {
"batch_size": 1,
"use_gpu": false
}
},
"lac": {
"init_args": {
"version": "1.1.0"
},
"predict_args": {
"batch_size": 1,
"use_gpu": false
}
}
},
"port": 8866,
"use_multiprocess": false,
"workers": 2,
"gpu": "0,1,2"
}
參數:
| 參數 |
用途 |
| modules_info |
PaddleHub Serving預安裝模型,以字典列表形式列出,key為模型名稱。其中: |
| port |
服務端口,默認為8866 |
| use_gpu |
使用GPU進行預測,必須安裝paddlepaddle-gpu |
| use_multiprocess |
是否啟用並發方式,默認為單進程方式,推薦多核CPU機器使用此方式 |
| workers |
啟動的並發任務數,在並發模式下才生效,默認為2*cpu_count-1,其中cpu_count代表CPU的核數 |
| gpu |
指定使用gpu的卡號,如1,2代表使用1號顯卡和2號顯卡,默認僅使用0號顯卡 |
NOTE: --use_gpu不可與–use_multiprocess共用。
(2) 訪問服務端
在使用PaddleHub Serving部署服務端的模型預測服務后,就可以在客戶端訪問預測接口以獲取結果了,接口url格式為:
http://127.0.0.1:8866/predict/<MODULE>
其中,<MODULE>為模型名。
通過發送一個POST請求,即可獲取預測結果,下面將展示一個具體的demo,以說明使用PaddleHub Serving部署和使用流程。
(3) 利用PaddleHub Serving進行個性化開發
使用PaddleHub Serving進行模型服務部署后,可以利用得到的接口進行開發,如對外提供web服務,或接入到應用程序中,以降低客戶端預測壓力,提高性能,下面展示了一個web頁面demo:
(4) 關閉serving
使用關閉命令即可關閉啟動的serving,
$ hub serving stop --port XXXX
參數:
| 參數 |
用途 |
| –port/-p |
指定要關閉的服務端口,默認為8866 |
Demo
將以lac分詞服務和ernie預訓練詞向量兩個模型為例,展示如何利用PaddleHub Serving部署在線服務。
(1) 在線lac分詞服務
主要分為3個步驟:
Step1. 部署lac在線服務
現在,要部署一個lac在線服務,以通過接口獲取文本的分詞結果。
首先,任意選擇一種啟動方式,兩種方式分別為:
$ hub serving start -m lac
或
$ hub serving start -c serving_config.json
其中serving_config.json的內容如下:
{
"modules_info": {
"lac": {
"init_args": {
"version": "1.1.0"
},
"predict_args": {
"batch_size": 1,
"use_gpu": false
}
}
},
"port": 8866,
"use_multiprocess": false,
"workers": 2
}
啟動成功界面如圖:
這樣就在8866端口成功部署了lac的在線分詞服務。 此處warning為Flask提示,不影響使用
Step2. 訪問lac預測接口
在服務部署好之后,可以進行測試,用來測試的文本為今天是個好日子和天氣預報說今天要下雨。
客戶端代碼如下:
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用於預測的文本並生成字典{"text": [text_1, text_2, ... ]}
text = ["今天是個好日子", "天氣預報說今天要下雨"]
# 以key的方式指定text傳入預測方法的時的參數,此例中為"data"
# 對應本地部署,則為lac.analysis_lexical(data=text, batch_size=1)
data = {"texts": text, "batch_size": 1}
# 指定預測方法為lac並發送post請求,content-type類型應指定json方式
url = "http://127.0.0.1:8866/predict/lac"
# 指定post請求的headers為application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印預測結果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
運行后得到結果:
{
"msg": "",
"results": [
{
"tag": [
"TIME", "v", "q", "n"
],
"word": [
"今天", "是", "個", "好日子"
]
},
{
"tag": [
"n", "v", "TIME", "v", "v"
],
"word": [
"天氣預報", "說", "今天", "要", "下雨"
]
}
],
"status": "0"
}
Step3. 停止serving服務
由於啟動時使用了默認的服務端口8866,則對應的關閉命令為:
$ hub serving stop --port 8866
或不指定關閉端口,則默認為8866。
$ hub serving stop
等待serving清理服務后,提示:
$ PaddleHub Serving will stop.
則serving服務已經停止。
(2) ernie預訓練詞向量服務化API的部署
Step1. 啟動PaddleHub Serving
運行啟動命令:
$ hub serving start -m ernie
這樣就完成了一個獲取預訓練詞向量服務化API的部署,默認端口號為8866。
NOTE: 如使用GPU預測,則需要在啟動服務之前,請設置CUDA_VISIBLE_DEVICES環境變量,否則不用設置。
Step2. 發送預測請求
配置好服務端,以下數行代碼即可實現發送預測請求,獲取預測結果
import requests
import json
# 指定用於預測的文本並生成字典{"text": [text_1, text_2, ... ]}
text = [["今天是個好日子", "天氣預報說今天要下雨"], ["這個賓館比較陳舊了,特價的房間也很一般。總體來說一般"]]
# 以key的方式指定text傳入預測方法的時的參數,此例中為"texts"
# 對應本地部署,則為module.get_embedding(texts=text)
data = {"texts": text}
# 發送post請求,content-type類型應指定json方式
url = "http://10.12.121.132:8866/predict/ernie"
# 指定post請求的headers為application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
