最近又回實驗室了,開始把空閑將近半年忘記的東西慢慢找回來。先把之前這邊用英文寫的介紹交叉熵的文章翻譯了。
背景
In classification, the most common setup is with one input, and the output is a vector of size of #classes. The predicted class of the input will be the corresponding class with the largest entry in the last network layer.
In classification task, cross-entropy loss (交叉熵) is the most common loss function you will see to train such networks. Cross-entropy loss can be written in the equation below. For example, there is a 3-class CNN. The output (\(y\)) from the last fully-connected layer is a \((3 \times 1)\) tensor. There is another vector \(y^{'}\) with the same dimension which refers to the ground-truth label of the input.
多分類問題里(單對象單標簽),一般問題的setup都是一個輸入,然后對應的輸出是一個vector,這個vector的長度等於總共類別的個數。輸入進入到訓練好的網絡里,predicted class就是輸出層里值最大的那個entry對應的標簽。
交叉熵在多分類神經網絡訓練中用的最多的loss function(損失函數)。據一個很簡單的例子,我們有一個三分類問題,對於一個input \(x\),神經網絡最后一層的output (\(y\))是一個\((3 \times 1)\)的向量。然后這個\(x\)對應的ground-truth(\(y^{'}\) )也是一個\((3 \times 1)\)的向量。
交叉熵
Say, 3 classes are 0, 1 and 2 respectively. And the input belongs to class 0. The network output \((y)\) is then something like \((3.8,-0.2,0.45)\) if this is a reasonably trained classifier. Ground-truth vector (\(y^{'}\)) for this input is \((1,0,0)\). So for an input of class 0, we have:
三個類別分別是類別0,1和2。這里讓input \(x\)屬於類別0。所以ground-truth(\(y^{'}\) ) 就等於\((1,0,0)\), 讓網絡的預測輸出等於\((3.8,-0.2,0.45)\),舉一個很簡單的例子:
交叉熵損失的定義如下公式所示(在上面的列子里,i是從0到2的):
The formal definition of cross-entropy loss is as following, with i ranging from 0 to 2:
Softmax
The computation of softmax can be represented by a softmax layer demonstrated by the diagram below. Note that the input to the softmax layer (z) in the diagram is our vector y mentioned above.
softmax的計算可以在下圖找到。注意在圖里,softmax的輸入\((3,1,-3)\) 是神經網絡最后一個fc層的輸出(\(y\))。
softmax常用於多分類過程中,它將多個神經元的輸出,歸一化到( 0, 1) 區間內,因此Softmax的輸出可以看成概率,從而來進行多分類。
nn.CrossEntropyLoss() in Pytorch
Essentially, the cross-entropy only has one term. Because there is only the probability of the ground-truth class that is left in the cross-entropy loss equation:
其實歸根結底,交叉熵損失的計算值需要一個term。這個term就是在softmax輸出層中找到ground-truth里正確標簽對應的那個entry \(j\) —>(\(\log(softmax(y_j))\))。
Here, j corresponds to the ground-truth class. And \(y_i = 1\) only when \(i=j\) otherwise \(y_i = 0\).
因為entry \(j\)對應的是ground-truth里正確的class。只有在\(i=j\)的時候才\(y_i = 1\),其他時候都等於0。
In the script below, the result of torch.nn.CrossEntropyLoss() is compared with hand-calculated result of cross-entropy loss. It is testified that torch.nn.CrossEntropyLoss() takes the input of raw network output layer, which means the computation of softmax layder in included in the function. So when we construct the network in pytorch, there is no need to append an extra softmax layer after the final fully-connected layer.
import torchs
import torch.nn as nn
import math
output = torch.randn(1, 5, requires_grad = True) #假設是網絡的最后一層,5分類
label = torch.empty(1, dtype=torch.long).random_(5) # 0 - 4, 任意選取一個分類
print ('Network Output is: ', output)
print ('Ground Truth Label is: ', label)
score = output [0,label.item()].item() # label對應的class的logits(得分)
print ('Score for the ground truth class = ', label)
first = - score
second = 0
for i in range(5):
second += math.exp(output[0,i])
second = math.log(second)
loss = first + second
print ('-' * 20)
print ('my loss = ', loss)
loss = nn.CrossEntropyLoss()
print ('pytorch loss = ', loss(output, label))