損失函數總結以及python實現：hinge loss(合頁損失)、softmax loss、cross_entropy loss(交叉熵損失)

本文轉載自查看原文 2018-10-06 21:04 5174

損失函數在機器學習中的模型非常重要的一部分，它代表了評價模型的好壞程度的標准，最終的優化目標就是通過調整參數去使得損失函數盡可能的小，如果損失函數定義錯誤或者不符合實際意義的話，訓練模型只是在浪費時間。

所以先來了解一下常用的幾個損失函數hinge loss(合頁損失)、softmax loss、cross_entropy loss(交叉熵損失)：

1：hinge loss(合頁損失)

又叫Multiclass SVM loss。至於為什么叫合頁或者折頁函數，可能是因為函數圖像的緣故。

s=WX,表示最后一層的輸出，維度為（C,None),$L_i$表示每一類的損失，一個樣例的損失是所有類的損失的總和。

$L_i=\sum_{j!=y_i}\left \{ ^{0 \ \ \ \ \ \ \ \ if \ s_{y_i}\geq s_j+1}_{s_j-s_{y_i}+1 \ otherwise} \right \}$
$ =\sum_{j!=y_i}max(0,s_{y_i}-s_j+1)$

函數圖像長這樣：

舉個例子：

假設我們只有3個類別，上面的圖片表示輸入，下面3個數字表示最后一層每個類別的分數。

對於第一張貓的圖片，$L_i$=max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1)=2.9+0=2.9

對於第二張汽車的圖片，$L_i$=max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)=0+0=0

可以看到對於分類錯誤的樣例，最后有一個損失產生，對於正確分類的樣例，其損失為0.

其實該損失函數不僅僅只是要求正確類別的分數最高，還要高出一定程度。也就是說即使分類正確了也有可能會產生損失，因為有可能正確的類別的分數沒有超過錯誤類別一定的閾值（這里設為1）。但是不管閾值設多少，好像並沒有什么太大的意義，只是對應着W的放縮而已。簡單的理解就是正確類別的得分不僅要最高，而且要高的比較明顯。

對於隨機初始化的權重，最終輸出應該也不叫均勻，loss應該能得到C-1，可以用這一點來檢驗自己的損失函數和前向傳播的實現是否正確。

2：softmax

softmax操作的意圖是把分數轉換成概率，來看看怎么轉換.

對於輸入$x_i$最后會得到C個類別的分數s，每個類別的概率,$P(Y=k|X=x_i)=\frac{e^{s_k}}{\sum_j e^{s_j}}$。

首先取指數次冪，得到整數，然后用比值代替概率。

這樣轉換了之后我們定義似然函數（也就是損失函數）$L_i=-logP(Y=y_i|X=x_i)$，也就是說正確類別的概率越大越好，這也很好理解。

同樣的我們來看一個例子：

同樣的，對於隨機初始話的權重和數據，$L_i$應該=-log(1/c)。

3：cross_entrop loss

熵的本質是香農信息量

現有關於樣本集的2個概率分布p和q，其中p為真實分布，q非真實分布。按照真實分布p來衡量識別一個樣本的所需要的編碼長度的期望(即平均編碼長度)為：H(p)=-∑p(i)∗logp(i)，如果使用錯誤分布q來表示來自真實分布p的平均編碼長度，則應該是：H(p,q)=-∑p(i)∗logq(i)。H(p,q)=-∑p(i)∗log⁡q(i)。因為用q來編碼的樣本來自分布p，所以期望H(p,q)中概率是p(i)。H(p,q)我們稱之為“交叉熵”。

根據公式可以看出，對於正確分類只有一個情況，交叉熵H(p,q)=- ∑logq(i)，其中q(i)為正確分類的預測概率,其余的項全部為0，因為除了正確類別，其余的真是概率p(i)都為0.在這種情況下，交叉熵損失與softmax損失其實就是同一回事。

python代碼實現：

 1 #首先是線性分類器的類實現 linear_classifier.py
 2 
 3 import numpy as np
 4 from linear_svm import *
 5 from softmax import *
 6 
 7 
 8 class LinearClassifier(object):
 9   #線性分類器的基類
10   def __init__(self):
11     self.W = None
12 
13   def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
14             batch_size=200, verbose=False):
15     """
16     使用SGD優化參數矩陣
17 
18     Inputs:
19     - X (N, D)
20     - y (N,) 
21     - learning_rate: 學習率.
22     - reg: 正則參數.
23     - num_iters: (int) 訓練迭代的次數
24     - batch_size: (int) 每次迭代使用的樣本數量.
25     - verbose: (boolean) 是否顯示訓練進度
26 
27     Outputs:
28     返回一個list保存了每次迭代的loss
29     """
30     num_train, dim = X.shape
31     num_classes = np.max(y) + 1 # 假設有k類，y的取值為【0，k-1】且最大的下標一定會在訓練數據中出現
32     
33     #初始化權重矩陣
34     if self.W is None:
35       self.W = 0.001 * np.random.randn(dim, num_classes)
36 
37     
38     loss_history = []
39     for it in range(num_iters):
40       #在每次迭代，隨機選擇batch_size個數據
41       mask=np.random.choice(num_train,batch_size,replace=True)
42       X_batch = X[mask]
43       y_batch = y[mask]
44 
45       # 計算損失和梯度
46       loss, grad = self.loss(X_batch, y_batch, reg)
47       loss_history.append(loss)
48 
49       # 更新參數
50       self.W-=grad*learning_rate
51 
52       if verbose and it % 100 == 0:
53         print('iteration %d / %d: loss %f' % (it, num_iters, loss))
54 
55     return loss_history
56 
57   def predict(self, X):
58     """
59     使用訓練好的參數來對輸入進行預測
60 
61     Inputs:
62     - X (N, D) 
63 
64     Returns:
65     - y_pred (N，):預測的正確分類的下標 
66     """
67     
68     y_pred=np.dot(X,self.W)
69     y_pred = np.argmax(y_pred, axis = 1)
70     
71     return y_pred
72   
73   def loss(self, X_batch, y_batch, reg):
74     """
75     這只是一個線性分類器的基類
76     不同的線性分類器loss的計算方式不同
77     所以需要在子類中重寫
78     """
79     pass
80 
81 
82 class LinearSVM(LinearClassifier):
83   """ 使用SVM loss """
84 
85   def loss(self, X_batch, y_batch, reg):
86     return svm_loss_vectorized(self.W, X_batch, y_batch, reg)
87 
88 
89 class Softmax(LinearClassifier):
90   """ 使用交叉熵 """
91 
92   def loss(self, X_batch, y_batch, reg):
93     return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

 1 #svm loss 的實現 softmax.py
 2 
 3 import numpy as np
 4 from random import shuffle
 5 
 6 def softmax_loss_naive(W, X, y, reg):
 7   """
 8   用循環實現softmax損失函數
 9   D,C,N分別表示數據維度，標簽種類個數和數據批大小
10   Inputs:
11   - W (D, C)：weights.
12   - X (N, D)：data.
13   - y (N,)： labels
14   - reg: (float) regularization strength
15 
16   Returns :
17   - loss 
18   - gradient 
19   """
20 
21   loss = 0.0
22   dW = np.zeros_like(W)
23 
24   num_classes = W.shape[1]
25   num_train = X.shape[0]
26 
27   for i in range(num_train):
28     scores=np.dot(X[i],W)
29     shift_scores=scores-max(scores)
30     dom=np.log(np.sum(np.exp(shift_scores)))
31     loss_i=-shift_scores[y[i]]+dom
32     loss+=loss_i
33     for j in range(num_classes):
34       softmax_output = np.exp(shift_scores[j])/sum(np.exp(shift_scores))
35       if j == y[i]:
36         dW[:,j] += (-1 + softmax_output) *X[i].T 
37       else: 
38         dW[:,j] += softmax_output *X[i].T
39   loss /= num_train 
40   loss += reg * np.sum(W * W)
41   dW = dW/num_train + 2*reg* W
42   
43 
44   return loss, dW
45 
46 
47 def softmax_loss_vectorized(W, X, y, reg):
48   """
49   無循環的實現
50   """
51   
52   loss = 0.0
53   dW = np.zeros_like(W)
54   num_classes = W.shape[1]
55   num_train = X.shape[0]
56   
57   scores=np.dot(X,W)
58   shift_scores=scores-np.max(scores,axis=1).reshape(-1,1)
59   softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
60   loss=np.sum(-np.log(softmax_output[range(num_train),y]))
61   loss=loss/num_train+reg * np.sum(W * W)
62   
63   dW=softmax_output.copy()
64   dW[range(num_train),y]-=1
65   dW=np.dot(X.T,dW)
66   dW = dW/num_train + 2*reg* W 
67 
68 
69   return loss, dW

 1 #svm loss的實現 linear_svm.py
 2 
 3 import numpy as np
 4 from random import shuffle
 5 
 6 def svm_loss_naive(W, X, y, reg):
 7   """
 8   用循環實現的SVM loss計算
 9   這里的loss函數使用的是margin loss
10 
11   Inputs:
12   - W (D, C)： 權重矩陣.
13   - X (N, D)： 批輸入
14   - y (N,) 標簽
15   - reg: 正則參數
16 
17   Returns :
18   - loss float
19   - W的梯度
20   """
21   dW = np.zeros(W.shape) 
22   num_classes = W.shape[1]
23   num_train = X.shape[0]
24   loss = 0.0
25 
26   for i in range(num_train):
27     scores = X[i].dot(W)
28     correct_class_score = scores[y[i]]
29     for j in range(num_classes):
30       if j == y[i]:
31         continue
32       margin = scores[j] - correct_class_score + 1 
33       if margin > 0:
34         loss += margin
35         dW[:,j]+=X[i].T
36         dW[:,y[i]]-=X[i].T
37 
38   
39   loss /= num_train
40   dW/=num_train
41   loss += reg * np.sum(W * W)
42   dW+=2* reg * W
43 
44 
45   return loss, dW
46 
47 
48 def svm_loss_vectorized(W, X, y, reg):
49   """
50   不使用循環，利用numpy矩陣運算的特性實現loss和梯度計算
51   """
52   loss = 0.0
53   dW = np.zeros(W.shape) 
54 
55   #計算loss
56   num_classes = W.shape[1]
57   num_train = X.shape[0]
58   scores=np.dot(X,W)#得到得分矩陣（N，C）
59   correct_socre=scores[range(num_train), list(y)].reshape(-1,1)#得到每個輸入的正確分類的分數
60   margins=np.maximum(0,scores-correct_socre+1)
61   margins[range(num_train), list(y)] = 0
62   loss=np.sum(margins)/num_train+reg * np.sum(W * W)
63   
64   #計算梯度
65   mask=np.zeros((num_train,num_classes))
66   mask[margins>0]=1
67   mask[range(num_train),list(y)]-=np.sum(mask,axis=1)
68   dW=np.dot(X.T,mask)
69   dW/=num_train
70   dW+=2* reg * W
71 
72   return loss, dW

  1 #最后是測試文件，看看實現的線性分類器在CIFAR10上的分類效果如何
  2 
  3 # coding: utf-8
  4 
  5 #實現hinge_loss和sotfmax_loss
  6 
  7 import random
  8 import numpy as np
  9 from cs231n.data_utils import load_CIFAR10
 10 import matplotlib.pyplot as plt
 11 from cs231n.classifiers.linear_svm import svm_loss_naive,svm_loss_vectorized
 12 from cs231n.classifiers.softmax import softmax_loss_naive,softmax_loss_vectorized
 13 import time
 14 from cs231n.classifiers import LinearSVM,Softmax
 15 %matplotlib inline
 16 plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
 17 plt.rcParams['image.interpolation'] = 'nearest'
 18 plt.rcParams['image.cmap'] = 'gray'
 19 
 20 
 21 #######################################################################################
 22 #######################################################################################
 23 ###################################第一部分 載入數據並處理###############################
 24 #######################################################################################
 25 #######################################################################################
 26 
 27 # 載入CIFAR10數據.
 28 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
 29 X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
 30 
 31 print('Training data shape: ', X_train.shape)
 32 print('Training labels shape: ', y_train.shape)
 33 print('Test data shape: ', X_test.shape)
 34 print('Test labels shape: ', y_test.shape)
 35 
 36 #每個分類選幾個圖片顯示觀察一下
 37 classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
 38 num_classes = len(classes)
 39 samples_per_class = 7
 40 for y, cls in enumerate(classes):
 41     idxs = np.flatnonzero(y_train == y)
 42     idxs = np.random.choice(idxs, samples_per_class, replace=False)
 43     for i, idx in enumerate(idxs):
 44         plt_idx = i * num_classes + y + 1
 45         plt.subplot(samples_per_class, num_classes, plt_idx)
 46         plt.imshow(X_train[idx].astype('uint8'))
 47         plt.axis('off')
 48         if i == 0:
 49             plt.title(cls)
 50 plt.show()
 51 
 52 #把數據分為訓練集，驗證集和測試集。
 53 #用一個小子集做測驗，運行更快。
 54 num_training = 49000
 55 num_validation = 1000
 56 num_test = 1000
 57 num_dev = 500
 58 
 59 #數據集本身沒有給驗證集，需要自己把訓練集分成兩部分
 60 mask = range(num_training, num_training + num_validation)
 61 X_val = X_train[mask]
 62 y_val = y_train[mask]
 63 
 64 mask = range(num_training)
 65 X_train = X_train[mask]
 66 y_train = y_train[mask]
 67 
 68 
 69 mask = np.random.choice(num_training, num_dev, replace=False)
 70 X_dev = X_train[mask]
 71 y_dev = y_train[mask]
 72 
 73 mask = range(num_test)
 74 X_test = X_test[mask]
 75 y_test = y_test[mask]
 76 
 77 print('Train data shape: ', X_train.shape)
 78 print('Train labels shape: ', y_train.shape)
 79 print('Validation data shape: ', X_val.shape)
 80 print('Validation labels shape: ', y_val.shape)
 81 print('Test data shape: ', X_test.shape)
 82 print('Test labels shape: ', y_test.shape)
 83 
 84 
 85 X_train = np.reshape(X_train, (X_train.shape[0], -1))
 86 X_val = np.reshape(X_val, (X_val.shape[0], -1))
 87 X_test = np.reshape(X_test, (X_test.shape[0], -1))
 88 X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
 89 
 90 print('Training data shape: ', X_train.shape)
 91 print('Validation data shape: ', X_val.shape)
 92 print('Test data shape: ', X_test.shape)
 93 print('dev data shape: ', X_dev.shape)
 94 
 95 
 96 # 預處理: 把像素點數據化成以0為中心
 97 # 第一步: 在訓練集上計算圖片像素點的平均值
 98 mean_image = np.mean(X_train, axis=0)
 99 print(mean_image.shape) 
100 plt.figure(figsize=(4,4))
101 plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # 可視化一下平均值
102 plt.show()
103 
104 # 第二步: 所有數據都減去剛剛得到的均值
105 X_train -= mean_image
106 X_val -= mean_image
107 X_test -= mean_image
108 X_dev -= mean_image
109 
110 
111 # 第三步: 給所有的圖片都加一個位，並設為1，這樣在訓練權重的時候就不需要b了，只需要w
112 # 相當於把b的訓練並入了W中
113 X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
114 X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
115 X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
116 X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
117 
118 print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)
119 
120 
121 #######################################################################################
122 #######################################################################################
123 ###################################第二部分 定義需要用到的函數###########################
124 #######################################################################################
125 #######################################################################################
126 
127 def cmp_naiveANDvectorized(naive,vectorized):
128     '''
129     每個損失函數都用兩種方式實現：循環和無循環(即利用numpy的特性)
130     '''
131 
132     W = np.random.randn(3073, 10) * 0.0001 
133 
134     #對比兩張實現方式的計算時間
135     tic = time.time()
136     loss_naive, grad_naive = naive(W, X_dev, y_dev, 0.000005)
137     toc = time.time()
138     print('Naive computed in %fs' % ( toc - tic))
139 
140     tic = time.time()
141     loss_vectorized, grad_vectorized = vectorized(W, X_dev, y_dev, 0.000005)
142     toc = time.time()
143     print('Vectorized  computed in %fs' % ( toc - tic))
144 
145     # 檢驗損失的實現是否正確，對於隨機初始化的數據的權重，
146     # softmax_loss應該約等於-log(0.1),svm_loss應該約等於9
147     print('loss %f %f' % (loss_naive , loss_vectorized))
148 
149     # 對比兩種實現方式得到的結果是否相同
150     print('difference loss %f ' % (loss_naive - loss_vectorized))
151     difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
152     print('difference gradient: %f' % difference)
153 
154 def cross_choose(Linear_classifier,learning_rates,regularization_strengths):
155     '''
156     選擇超參數
157     '''
158 
159     results = {}    # 存儲每一對超參數對應的訓練集和驗證集上的正確率
160     best_val = -1   # 最好的驗證集上的正確率
161     best_model = None # 最好的驗證集正確率對應的svm類的對象
162     best_loss_hist=None
163     for rs in regularization_strengths:
164         for lr in learning_rates:
165             classifier = Linear_classifier
166             loss_hist = classifier.train(X_train, y_train, lr, rs, num_iters=3000)
167             y_train_pred = classifier.predict(X_train)
168             train_accuracy = np.mean(y_train == y_train_pred)
169             y_val_pred = classifier.predict(X_val)
170             val_accuracy = np.mean(y_val == y_val_pred)
171             if val_accuracy > best_val:
172                 best_val = val_accuracy
173                 best_model = classifier  
174                 best_loss_hist=loss_hist         
175             results[(lr,rs)] = train_accuracy, val_accuracy
176 
177     for lr, reg in sorted(results):
178         train_accuracy, val_accuracy = results[(lr, reg)]
179         print ('lr %e reg %e train accuracy: %f val accuracy: %f' % (
180                     lr, reg, train_accuracy, val_accuracy))
181         
182     print('best validation accuracy achieved during cross-validation: %f' % best_val)
183     # 可視化loss曲線
184     plt.plot(best_loss_hist)
185     plt.xlabel('Iteration number')
186     plt.ylabel('Loss value')
187     plt.show()
188     return results,best_val,best_model
189 
190 def show_weight(best_model):
191     # 看看最好的模型的效果
192     # 可視化學到的權重
193 
194     y_test_pred = best_model.predict(X_test)
195     test_accuracy = np.mean(y_test == y_test_pred)
196     print('final test set accuracy: %f' % test_accuracy)
197 
198     w = best_model.W[:-1,:] # 去掉偏置參數
199     w = w.reshape(32, 32, 3, 10)
200     w_min, w_max = np.min(w), np.max(w)
201     classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
202     for i in range(10):
203         plt.subplot(2, 5, i + 1)
204           
205         # 把權重轉換到0-255
206         wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
207         plt.imshow(wimg.astype('uint8'))
208         plt.axis('off')
209         plt.title(classes[i])
210     plt.show()
211 
212 #######################################################################################
213 #######################################################################################
214 ##########################第三部分 應用和比較svm_loss和softmax_loss######################
215 #######################################################################################
216 #######################################################################################
217 
218 cmp_naiveANDvectorized(svm_loss_naive,svm_loss_vectorized)
219 learning_rates = [(1+i*0.1)*1e-7 for i in range(-3,5)]
220 regularization_strengths = [(5+i*0.1)*1e3 for i in range(-3,3)] 
221 #正則參數的選擇要根據正常損失和W*W的大小的數量級來確定，初始時正常loss大概是9，W*W大概是1e-6
222 #可以觀察最后loss的值的大小來繼續調整正則參數的大小，使正常損失和正則損失保持合適的比例
223 results,best_val,best_model=cross_choose(LinearSVM(),learning_rates,regularization_strengths)
224 show_weight(best_model)
225 
226 print("--------------------------------------------------------")
227 
228 cmp_naiveANDvectorized(softmax_loss_naive,softmax_loss_vectorized)
229 learning_rates = [(2+i*0.1)*1e-7 for i in range(-2,2)]
230 regularization_strengths = [(7+i*0.1)*1e3 for i in range(-3,3)] 
231 results,best_val,best_model=cross_choose(Softmax(),learning_rates,regularization_strengths)
232 show_weight(best_model)

 1 #獲取數據的部分
 2 
 3 from six.moves import cPickle as pickle
 4 import numpy as np
 5 import os
 6 from scipy.misc import imread
 7 import platform
 8 
 9 def load_pickle(f):
10     version = platform.python_version_tuple()
11     if version[0] == '2':
12         return  pickle.load(f)
13     elif version[0] == '3':
14         return  pickle.load(f, encoding='latin1')
15     raise ValueError("invalid python version: {}".format(version))
16 
17 def load_CIFAR_batch(filename):
18   """ CIRAR的數據是分批的，這個函數的功能是載入一批數據 """
19   with open(filename, 'rb') as f:
20     datadict = load_pickle(f) #以二進制方式打開文件
21     X = datadict['data']
22     Y = datadict['labels']
23     X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
24     Y = np.array(Y)
25     return X, Y
26 
27 def load_CIFAR10(ROOT):
28   """ load 所有的數據 """
29   xs = []
30   ys = []
31   for b in range(1,6):
32     f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
33     X, Y = load_CIFAR_batch(f)
34     xs.append(X)
35     ys.append(Y)    
36   Xtr = np.concatenate(xs)
37   Ytr = np.concatenate(ys)
38   del X, Y
39   Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
40   return Xtr, Ytr, Xte, Yte
41 
42 
43 def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000,
44                      subtract_mean=True):
45     """
46     Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
47     it for classifiers. These are the same steps as we used for the SVM, but
48     condensed to a single function.
49     """
50     # Load the raw CIFAR-10 data
51     cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
52     X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
53         
54     # Subsample the data
55     mask = list(range(num_training, num_training + num_validation))
56     X_val = X_train[mask]
57     y_val = y_train[mask]
58     mask = list(range(num_training))
59     X_train = X_train[mask]
60     y_train = y_train[mask]
61     mask = list(range(num_test))
62     X_test = X_test[mask]
63     y_test = y_test[mask]
64 
65     # Normalize the data: subtract the mean image
66     if subtract_mean:
67       mean_image = np.mean(X_train, axis=0)
68       X_train -= mean_image
69       X_val -= mean_image
70       X_test -= mean_image
71     
72     # Transpose so that channels come first
73     X_train = X_train.transpose(0, 3, 1, 2).copy()
74     X_val = X_val.transpose(0, 3, 1, 2).copy()
75     X_test = X_test.transpose(0, 3, 1, 2).copy()
76 
77     # Package data into a dictionary
78     return {
79       'X_train': X_train, 'y_train': y_train,
80       'X_val': X_val, 'y_val': y_val,
81       'X_test': X_test, 'y_test': y_test,
82     }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 損失函數 hinge loss vs softmax loss 交叉熵（Cross Entropy） Softmax函數與交叉熵損失函數關於交叉熵（cross entropy），你了解哪些 softmax、cross entropy和softmax loss學習筆記 softmax分類器+cross entropy損失函數的求導 SVM的損失函數（Hinge Loss) 交叉熵損失，softmax函數和 torch.nn.CrossEntropyLoss()中文【python實現卷積神經網絡】損失函數的定義（均方誤差損失、交叉熵損失）交叉熵損失函數