[Active Learning] 01 A Brief Introduction to Active Learning 主動學習簡介

本文轉載自查看原文 2019-04-23 22:24 1328 active learning/ machine learning

什么是主動學習？
- 主動學習 vs. 被動學習
為什么需要主動學習？
主動學習與監督學習、弱監督學習、半監督學習、無監督學習之間的關系
主動學習的種類
主動學習的一個例子
主動學習工具包 ALiPy
主動學習相關博客
References

本文將簡單介紹什么是主動學習（Active Learning，AL），為什么需要主動學習，主動學習和監督學習、弱監督學習、半監督學習、無監督學習之間是什么關系。最后再簡單介紹主動學習的分類。（這里介紹的主動學習是機器學習的一個子領域。）

什么是主動學習？

主動學習（Active Learning），在統計學領域有時也叫“查詢學習”（query learning）、“最優實驗設計”（optimal experimental design），是機器學習的一個子領域。

主動學習背后一個關鍵的假設：

一個機器學習算法如果能夠自行選擇從哪些數據進行學習，通過較少的訓練數據，它將表現得更好。

If the learning algorithm can choose the data from which it learns, it will perform better with less training.[1]

主動學習之所以叫主動學習，是因為算法從數據集中主動地選擇一些不帶標簽的數據進行標注，而不是被動地選擇。在每一次標注之后，模型重新或者增量地在帶標簽的數據上訓練，然后再主動地選擇不帶標簽數據進行標注，重復這個過程，這就是主動學習的流程。

主動學習 vs. 被動學習

被動學習（passive learning）被認為是從數據集中隨機選擇（randomly select）數據進行標注。

而主動學習選擇要標注的樣本時，有一些 criteria 進行指導，這就是主動學習和被動學習的區別。

不過被動學習似乎叫的不多，一般用 random selection 與主動學習的 criteria 比較就好。

為什么需要主動學習？

數據標注的成本高昂，迫使我們想要用更少的標注數據來獲得更有效的模型，這就是主動學習產生的原因。

主動學習與監督學習、弱監督學習、半監督學習、無監督學習之間的關系

我們根據訓練數據集標簽的情況來划分這幾者：（歡迎大佬指正）

監督學習（Supervised learning）任務中，數據集的標簽都是完整而精確的。
無監督學習（Unsupervised learning）任務中，數據集是不含標簽的。
弱監督學習（Weakly-supervised learning）任務中，數據集的標簽分為三種情況：（這三種情況可能同時出現）
- 部分數據有標簽，部分數據沒有標簽。一般有標簽的數據占少數，大部分數據沒有標簽。(Incompelet supervison）
- 數據都有標簽，但是標簽的粒度不夠。例如，在圖像語義分割中，細粒度的標簽應該是 pixel-level 的，但給出的標簽僅僅是 image-level 的，這就是標簽的粒度不夠。(Inexact supervison)
- 數據都有標簽，但是標簽有很多錯誤。(Inaccurate supervison)

Fig. 1 [2] Illustration of three typical types of weak supervision.

而主動學習對應弱監督學習的第一種情況，少部分數據含標簽，但是大部分數據不含標簽。

主動學習和半監督學習是什么關系？兩者都可以認為是弱監督學習第一種情況的處理方式，但兩者也有不一樣的地方，比如主動學習需要人工標注數據，而半監督學習不要。

主動學習的種類

Fig. 2 [1] Diagram illustrating the three main active learning scenarios.

根據應用場景，將主動學習划分為 3 類： - 第一種是“Membership query synthesis”，字面意思上很難理解，不過這種方式的 instance 是算法從整個可能的樣本空間中生成的，模型從頭開始生成一個 instance 然后送去 oracle 打標簽。

第二種是“steam-based selective sampling”，每一次我們能夠從數據流得到一個 instance，然后判斷其是否要送去 oracle 打標簽。
第三種是“pool-based sampling”，初始時，我們就有很多 unlabeled data，只需要從這些 unlabeled data 中選擇數據送到 oracle 打標簽。（這種情況是最常見的。）

Fig. 3 [1] Pool-based active learning.

可能會有一個疑問，主動學習中的 oracle 是什么？oracle 可以是一個專家，打標簽百分之百正確；也可以是很多擁有不同專業知識的人，打標簽不是百分百對，如眾包。

主動學習的一個例子

Fig. 4 [1] An example of pool-based active learning.

Example from [1]： - (a) A toy data set of 400 instances, evenly sampled from two class Gaussians. - (b) A logistic regression model trained with 30 labeled instances randomly drawn from the problem domain.(accuracy:70%) - (c) A logistic regression model trained with 30 actively queried instances using uncertainty sampling.(accuracy:90%)

主動學習工具包 ALiPy

ALiPy (Active Learning in Python) [3] 是南京航空航天大學黃聖君老師做的一個開源的主動學習工具包，我們可以很輕松地基於該工具包開發主動學習的程序，強烈推薦。

ALiPy 主頁：http://parnec.nuaa.edu.cn/huangsj/alipy/。

ALiPy GitHub：https://github.com/NUAA-AL/ALiPy。

主動學習相關博客

https://blog.csdn.net/Houchaoqun_XMU

References

[1] Burr Settles.(2009). Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison.
[2] Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53. https://doi.org/10.1093/nsr/nwx106
[3] Tang, Y.-P., Li, G.-X., & Huang, S.-J. (2019). ALiPy: Active Learning in Python, 1–5. Retrieved from http://arxiv.org/abs/1901.03802

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【論文閱讀】主動學習 (Active Learning) A brief introduction to weakly supervised learning（簡要介紹弱監督學習）主動學習簡介 [Machine Learning] Active Learning Deep Learning for Chatbots（Introduction） An Introduction to Statistical Learning with Applications in R (ISL) - Introduction An Introduction to Handlebars（Handlebars 簡介） [主動學習--查詢策略] 01 Core-set 深度強化學習（Deep Reinforcement Learning）入門：RL base & DQN-DDPG-A3C introduction 使用深度學習的超分辨率介紹 An Introduction to Super Resolution using Deep Learning