歸納式學習,就是我們平時訓練的神經網絡,訓練階段測試集不參與訓練,模型訓練好后,再對測試集進行預測;
直推式學習,測試集也加入訓練,知道這點區別就行了;
Induction and Transduction…You may have come across these two words many times when reading books and articles on machine learning. In this article, let’s try to understand the differences in these two learning approaches and how they can be used according to our use-case.
歸納法和直推法……閱讀有關機器學習的書籍和文章時,您可能多次遇到過這兩個詞。 在本文中,讓我們嘗試了解這兩種學習方法的差異以及如何根據我們的用例使用它們。
Understanding the Definitions(理解定義)
Transduction is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases.
直推式學習是從觀察到的特定(訓練)案例到特定(測試)案例的推理。 相反,歸納是從觀察到的訓練案例到一般規則的推理,然后將其應用於測試案例。
Let’s breakdown and understand these two definitions.
讓我們分析並並理解這兩個定義
Induction(歸納學習)
Induction is reasoning from observed training cases to general rules, which are then applied to the test cases.
歸納是從觀察到的訓練案例到一般規則的推理,然后將其應用於測試案例。
Inductive learning is the same as what we commonly know as traditional supervised learning. We build and train a machine learning model based on a labelled training dataset we already have. Then we use this trained model to predict the labels of a testing dataset which we have never encountered before.
歸納學習與我們通常稱為傳統監督學習的知識相同。 我們基於已有的標記訓練數據集構建和訓練機器學習模型。 然后,我們使用這個訓練過的模型來預測我們從未遇到過的測試數據集的標簽。
Transduction(直推式學習)
Transduction is reasoning from observed, specific (training) cases to specific (test) cases.
直推式學習是從觀察到的特定(訓練)案例到特定(測試)案例的推理。
In contrast to inductive learning, transductive learning techniques have observed all the data beforehand, both the training and testing datasets. We learn from the already observed training dataset and then predict the labels of the testing dataset. Even though we do not know the labels of the testing datasets, we can make use of the patterns and additional information present in this data during the learning process.
與歸納學習相反,直推式學習技術已經預先觀察了所有數據,包括訓練和測試數據集。 我們從已經觀察到的訓練數據集中學習,然后預測測試數據集的標簽。 即使我們不知道測試數據集的標簽,我們也可以在學習過程中利用這些數據中存在的模式和其他信息。
Example transductive learning approaches include transductive SVM (TSVM) and graph-based label propagation algorithms (LPA).
直推式的學習方法例子有直推式SVM(TSVM)和基於圖標簽的傳播算法(LPA)。
What are the Differences?(兩種學習方法有什么區別?)
Now that you have a clear idea about the definitions of inductive and transductive learning, let’s see what are the differences. The definitions pretty much speak out the differences, but let’s go through them so that it will be more clear.
既然您對歸納學習和直推式學習的定義有了清晰的認識,讓我們看看有什么區別。 這些定義幾乎說明了差異,但讓我們仔細研究一下它們,以便更加清楚。
The main difference is that during transductive learning, you have already encountered both the training and testing datasets when training the model. However, inductive learning encounters only the training data when training the model and applies the learned model on a dataset which it has never seen before.
主要區別在於,在直推式學習期間,你在訓練模型時已經遇到了訓練和測試數據集。 但是,歸納學習在訓練模型時僅遇到訓練數據,並將學習到的模型應用於從未見過的數據集。
Transduction does not build a predictive model. If a new data point is added to the testing dataset, then we will have to re-run the algorithm from the beginning, train the model and then use it to predict the labels. On the other hand, inductive learning builds a predictive model. When you encounter new data points, there is no need to re-run the algorithm from the beginning.
直推式學習不能建立預測模型。 如果將新的數據點添加到測試數據集中,那么我們將必須從頭開始重新運行算法,訓練模型,然后使用它來預測標簽。 另一方面,歸納學習建立了預測模型。 當您遇到新的數據點時,無需從頭開始重新運行算法。
In more simple terms, inductive learning tries to build a generic model where any new data point would be predicted, based on an observed set of training data points. Here you can predict any point in the space of points, beyond the unlabelled points. In contrary, transductive learning builds a model that fits the training and testing data points it has already observed. This approach predicts labels of unlabelled points using the knowledge of the labelled points and additional information.
用更簡單的術語來說,歸納學習試圖基於觀察到的一組訓練數據點,建立一個可以預測任何新數據點的通用模型。 在這里,您可以預測點空間中除未標記點之外的任何點。 相反,直推式學習建立一個模型,該模型適合已經觀察到的訓練和測試數據點。 這種方法使用標記點的知識和其他信息來預測未標記點的標記。
Transductive learning can become costly in the case where new data points are introduced by an input stream. Each time a new data point arrives, you will have to re-run everything. On the other hand, inductive learning initially builds a predictive model and new data points can be labelled within a very short time with lesser computations.
在輸入流引入新數據點的情況下,直推式學習的成本可能會很高。 每次有新數據點到達時,您都必須重新運行所有內容。 另一方面,歸納學習最初會建立一個預測模型,並且可以在很短的時間內用較少的計算來標記新的數據點。

Example Walkthrough(示例演練)

Firstly, I will take the example shown in Figure 1. Consider that you have a set of points as shown in Figure 1. There are four labelled points A, B, C and D. Our goal is to label (colour) the remaining unlabelled (uncoloured) points numbered from 1 to 14. If we use inductive learning for this task, we will have to use these 4 labelled points and build a supervised learning model.
首先,我將以圖1所示的示例為例。假設您具有一組如圖1所示的點。有四個標記點A,B,C和D。我們的目標是標記(彩色)其余未標記的點 (無色)點,編號為1到14。如果我們使用歸納學習來完成此任務,則必須使用這4個標記點,並建立一個監督學習模型。

At a glance, we can see that there are two separate clusters. However, in inductive learning, since we have a very little number of training samples, it will be quite hard to build a predictive model that captures the complete structure of the data. For example, if a nearest neighbour approach is used, points closer to the border such as 12 and 14 may be coloured as red instead of green as they are closer to the red points A and B rather than the green points C and D (as shown in Figure 2).
一目了然,我們可以看到有兩個單獨的集群。 但是,在歸納學習中,由於我們的訓練樣本數量很少,因此要構建一個捕獲數據完整結構的預測模型將非常困難。 例如,如果使用最近鄰點方法,則靠近邊界的點(例如12和14)可能會被着色為紅色而不是綠色,因為它們更靠近紅色點A和B而不是綠色點C和D(例如 如圖2所示。

If we have some additional information about the data points such as connectivity information between the points based on features like similarity (as shown in Figure 3), we can use this additional information while training the model and labelling the unlabelled points.
如果我們有一些關於數據點的附加信息,例如基於相似性等特征的點之間的連通性信息(如圖3所示),我們可以在訓練模型和標記未標記的點時使用這些附加信息。
圖4沒有找到,原文中的顯示不了
For example, we can use a transductive learning approach such as a semi-supervised graph-based label propagation algorithm to label the unlabelled points as shown in Figure 4, using the structural information of all the labelled and unlabelled points. Points along the border such as 12 and 14 are connected to more green points than red points, and hence they get labelled as green, rather than red.
例如,我們可以使用直推式學習方法,例如基於半監督圖的標簽傳播算法,使用所有標記和未標記點的結構信息來標記未標記點,如圖4所示。 沿邊界的點(例如12和14)連接到的綠色點多於紅色點,因此它們被標記為綠色,而不是紅色。