- 01 Introduction
- Bridging this gap between AI and humans is an important direction.
- FSL can also help relieve the burden of collecting large-scale supervised data.
- Driven by the academic goal for AI to approach humans and the industrial demand for inexpensive learning, FSL has drawn much recent attention and is now a hot topic.
- Contributions of this survey can be summarized as follows
- 02 Overview
原文鏈接: 小樣本學習與智能前沿·公眾號

Author list
YAQING WANG, Hong Kong University of Science and Technology and Baidu Research
QUANMING YAO∗, 4Paradigm Inc.
JAMES T. KWOK, Hong Kong University of Science and Technology
LIONEL M. NI, Hong Kong University of Science and Technology
機器學習在數據密集型應用程序中非常成功,但是在數據集較小時通常會受到阻礙。近來,提出了“少量學習”(FSL)來解決這個問題。使用現有知識,FSL可以快速推廣到僅包含少量帶有監督信息的樣本的新任務。
在本文中,我們進行了徹底的調查,以全面了解FSL。從FSL的正式定義開始,我們將FSL與幾個相關的機器學習問題區分開來。然后,我們指出FSL的核心問題是經驗風險最小化工具不可靠。根據如何使用先驗知識來處理此核心問題,我們從三個角度對FSL方法進行了分類:
- (i)數據,它使用先驗知識來增強監督經驗;
- (ii)模型,它使用先驗知識來減小假設空間的大小;
- (iii)算法,該算法使用先驗知識來改變對給定假設空間中最佳假設的搜索。
通過這種分類法,我們將審查和討論每個類別的利弊。在FSL問題設置,技術,應用和理論方面,也提出了有希望的方向,以為未來的研究提供見識。
Additional Key Words and Phrases: Few-Shot Learning, One-Shot Learning, Low-Shot Learning, Small Sample Learning, Meta-Learning, Prior Knowledge
01 Introduction
current AI techniques cannot rapidly generalize from a few examples. The aforementioned successful AI applications rely on learning from large-scale data.
In contrast, humans are capable of learning new tasks rapidly by utilizing what they learned in the past.
examples:
a child who learned how to add can rapidly transfer his knowledge to learn multiplication given a few examples (e.g., 2 × 3 = 2 + 2 + 2 and 1 × 3 = 1 + 1 + 1).
Another example is that given a few photos of a stranger, a child can easily identify the same person from a large number of photos.
Bridging this gap between AI and humans is an important direction.
In order to learn from a limited number of examples with supervised information, a new machine learning paradigm called Few-Shot Learning (FSL) [35, 36] is proposed.
當然,FSL還可以推進機器人技術[26],后者開發出可以復制人類行為的機器。 例子包括一桿模仿[147],多臂匪[33],視覺導航[37]和連續控制[156]。
FSL can also help relieve the burden of collecting large-scale supervised data.
although ResNet [55] outperforms humans on ImageNet, each class needs to have sufficient labeled images which can be laborious to collect.
Examples include
- image classification [138]
- image retrieval [130]
- object tracking [14]
- gesture recognition [102]
- image captioning
- visual question answering [31]
- video event detection [151]
- language modeling [138]
- neural architecture search [19]
Driven by the academic goal for AI to approach humans and the industrial demand for inexpensive learning, FSL has drawn much recent attention and is now a hot topic.
Many related machine learning approaches have been proposed:
- meta-learning [37, 106, 114],
- embedding learning [14, 126, 138]
- generative modeling [34, 35, 113].
Contributions of this survey can be summarized as follows
- 我們給出了FSL的正式定義,該定義自然與[92,94]中的經典機器學習定義相關。該定義不僅足夠籠統地包括現有的FSL作品,而且足夠具體以闡明FSL的目標是什么以及我們如何解決它。該定義有助於在FSL領域設定未來的研究目標。
- 我們通過具體示例列出了FSL的相關學習問題,闡明了它們與FSL的相關性和差異。這些討論可以幫助更好地區分FSL,並將其定位在各種學習問題之間。
- 我們指出,FSL監督學習問題的核心問題是不可靠的經驗風險最小化器,它是基於機器學習中的錯誤分解[17]進行分析的。這為以更組織和系統的方式改進FSL方法提供了見識。
- 我們進行了廣泛的文獻回顧,並從數據,模型和算法的角度將它們整理成統一的分類法。我們還提供了一些見解的摘要,並就每種類別的利弊進行了討論。這些可以幫助您更好地理解FSL方法。
- 我們在問題設置,技術,應用和理論方面為FSL提出了有希望的未來方向。這些見解基於FSL當前發展的弱點,並可能在將來進行改進。
01. 2 Notation and Terminology
Consider a learning task \(T\), FSL deals with a data set \(D = \left\{D_{train},D_{test}\right\}\) consisting of a training set \(D_{train} = \left\{(x_i,y_i)\right\}_{i=1}^I\) whereI \(I\) is small,and a testing set \(D_{test} = \left\{x_{test}\right\}\). Letp(x,y) be the ground-truth joint probability distribution(聯合概率分布) of input x and output y, and \(\hat{y}\) be the optimal hypothesis from x to y. FSL learns to discover \(\hat{y}\) by fitting \(D_{trian}\) and testing on \(D_{test}\). θ denotes all the parameters used by h.
A FSL algorithm is an optimization strategy that searches H in order to find the θ that parameterizes the best h*. The FSL performance is measured by a loss function \(l(\hat{y},y)\) defined over the prediction \(\hat{y}= h(x;θ)\) and the observed output y.
02 Overview
02.1 Problem definition
how machine learning is defined
A computer program is said to learn from experience E with respect to some classes of task T and performance measure P if its performance can improve with E on T measured by P .
examples:
consider an image classification task (T ), a machine learning program can improve its classification accuracy (P) through E obtained by training on a large number of labeled images (e.g., the ImageNet data set [73]).
FSL is a special case of machine learning, which targets at obtaining good learning performance given limited supervised information provided in the training set \(D_{train}\).
The definition of FSL
Few-Shot Learning (FSL) is a type of machine learning problems (specified by E, T and P), where E contains only a limited number of examples with supervised information for the target T .
few-shot classification learns classifiers given only a few labeled examples of each class.
example applications:
- image classification [138]
- sentiment classification(情緒分類) from short text [157]
- object recognition [35].
N-way-K-shot classification[37,138]
\(D_{train}\) contains I = KN examples from N classes each with K examples.
Few-shot regression [37, 156]
estimates a regression function h given only a few input-output example pairs sampled from that function, where output \(y_i\) is the observed value of the dependent variable y, and \(x_i\) is the input which records the observed value of the independent variable x.
few-shot reinforcement learning [3, 33]
targets at finding a policy given only a few trajectories consisting of state-action pairs.
three typical scenarios of FSL
- 像人類一樣充當學習的試驗床。為了邁向人類智能,計算機程序能夠解決FSL問題至關重要。一個流行的任務(T)是僅給出幾個例子就生成一個新角色的樣本[76]。受人類學習方式的啟發,計算機程序使用E進行學習,E由既有監督信息的給定示例以及受過訓練的概念(如零件和關系)作為先驗知識組成。通過視覺圖靈測試(P)的通過率評估生成的字符,該測試可區分圖像是由人還是由機器生成的。有了這些先驗知識,計算機程序還可以學習分類,解析和生成新 -的手寫字符,例如人類。
- 學習罕見的情況。當很難或不可能獲得帶有監督信息的足夠示例時,FSL可以為罕見情況學習模型。例如,考慮一個葯物發現任務(T),該任務試圖預測一個新分子是否具有毒性作用[4]。通過新分子的有限測定和許多類似分子的測定(既有知識)獲得的E,正確分配為有毒或無毒(P)分子的百分比會提高。
- 減少數據收集工作量和計算成本。 FSL可以幫助減輕收集大量帶有監督信息的示例的負擔。考慮少量拍攝圖像分類任務(T)[35]。圖像分類精度(P)隨目標類別T的每個類別的少量標記圖像獲得的E以及從其他類別中提取的先驗知識(例如原始圖像到共同訓練)而提高。成功完成此任務的方法通常具有較高的通用性。因此,它們可以輕松地應用於許多樣本的任務。
One typical type of FSL methods is Bayesian learning [35, 76]. It combines the provided training set \(D_{train}\) with some prior probability distribution which is available before \(D_{train}\) is given [15].
When there is only one example with supervised information in E,FSL is called one-shot learning [14, 35, 138]. When E does not contain any example with supervised information for the target T , FSL becomes a zero-shot learning problem (ZSL) [78].
ZSL requires E to contain information from other modalities(形式) (such as attributes, WordNet, and word embeddings used in rare object recognition tasks), so as to transfer some supervised information and make learning possible.
02.2 Relevant Learning Problems
Weakly supervised learning [163]
only a small amount of samples have supervised information.
this can be further classified into the following:
- Semi-supervised learning [165], which learns from a small number of labeled samples and (usually a large number of) unlabeled samples in E. Positive-unlabeled learning [81] is a special case of semi-supervised learning, in which only positive and unlabeled samples are given.
- Active learning [117], which selects informative unlabeled data to query an oracle for output y. This is usually used for applications where annotation labels are costly, such as pedestrian detection.
weakly supervised learning with incomplete supervision mainly uses unlabeled data as ad- ditional information in E, while FSL leverages various kinds of prior knowledge such as pre-trained models, supervised data from other domains or modalities and does not restrict to using unlabeled data. Therefore, FSL becomes weakly supervised learning problem only when prior knowledge is unlabeled data and the task is classification or regression.
Imbalanced learning [54]
從經驗E中學習y的偏斜分布。 當很少使用y的某些值時(例如在欺詐檢測和巨災預測應用程序中),就會發生這種情況。 它會進行訓練和測試,以便在所有可能的y中進行選擇。 相比之下,FSL會通過一些示例對y進行訓練和測試,同時可能會將其他y作為學習的先驗知識。
Transfer learning [101]
It can be used in applications such as cross-domain recommendation, WiFi localization across time periods, space and mobile devices.
Domain adaptation [11] is a type of transfer learning in which the source/target tasks are the same but the source/target domains are different.
example:
in sentiment analysis, the source domain data contains customer comments on movies, while the target domain data contains customer comments on daily goods.
Meta-learning [59]
Meta-learning [59] improves P of the new task T by the provided data set and the meta- knowledge extracted across tasks by a meta-learner. Specifically, the meta-learner gradually learns generic information (meta-knowledge) across tasks, and the learner generalizes the meta-learner for a new task T using task-specific information.
the meta-learner is taken as prior knowledge to guide each specific FSL task.
02.3 Core Issue
we illustrate the core issue of FSL based on error decomposition(分解) in supervised machine learning [17, 18]
02.3.1 Empirical Risk Minimization.
Given a hypothesis h, we want to minimize its expected risk R, which is the loss measured with respect to p(x,y). Specifically,
因為p(x,y)是未知的, the empirical risk(訓練集\(D_{train}\)的I個樣本的平均loss)表達如下:
The empirical risk 通常被用作\(R(h)\)的proxy,可以使得empirical risk minimization.(可能有一些正則化)
為了更好的說明,我們規定:
我們假設三者獨立,那么the total error 可以被分解為:
公式右邊第一項為 approximation error,第二項為estimation error.
總的來說,the total error 收到H(hypothesis space)和I(訓練集中樣本的數量)的影響,也就是說,想要減少total error,可以從三個方面下手:
- data
- model, which determines H
- algorithm,which 搜索滿足data的最優\(h_I\)
02.3.2 Unreliable Empirical Risk Minimizer.
estimation error 可以通過增加樣本量來減少[17,18,41]。所以,當有充足的訓練監督信息數據的時候,estimation error 很小。
this is the core issue of FSL supervised learning:
the empirical risk minimizer \(h_I\) is no longer reliable.
02.4 Taxonomy
為了緩解在FSL監督學習中具有不可靠的經驗風險最小化工具\(h_I\)的問題,必須使用先驗知識。基於使用先驗知識對哪個方面進行了增強,可以將現有的FSL作品分為以下幾個方面(圖2)。
- Data. 這種方法是利用先驗知識來增強訓練集,擴大I的數量。 使得在這種情況下,標准的機器學習模型和算法可以被使用。
- Model。 這種方式是利用先驗知識來約束假設空間的復雜性,使得假設空間變小。在這種情況下,訓練集足夠去學習一個可靠的\(h_I\).
- Algorithm. 這種方法使用先驗知識來搜索\(\theta\),\(\theta\) 參數化最佳的h。 先驗知識通過提供良好的初始化(圖2(c)中的灰色三角形)或指導搜索步驟(圖2(b)中的灰色虛線)來更改搜索策略。 對於后者,最終的搜索步驟受先驗知識和經驗風險最小化因素的影響。