吳恩達深度學習第二課第一周編程作業_regularization（正則化）

本文轉載自查看原文 2020-07-29 16:11 631 吳恩達深度學習-課后代碼作業/ 吳恩達、深度學習、Andrew Ng、課后代碼作業

Regularization 正則化

聲明

本文作業是在jupyter notebook上一步一步做的，帶有一些過程中查找的資料等（出處已標明）並翻譯成了中文，如有錯誤，歡迎指正！

參考：https://blog.csdn.net/u013733326/article/details/79847918

參考Kulbear 的【Initialization】和【Regularization】和【Gradient Checking】，以及念師的【10. 初始化、正則化、梯度檢查實戰】，以及何寬，

歡迎來到本周的第二次作業。深度學習模型有很大的靈活性和容量，如果訓練數據集不夠大，過擬合可能會成為一個嚴重的問題。當然，它在訓練集上做得很好，但學習過的網絡不能推廣到它從未見過的新例子!（也就是訓練可以，一到實戰測試就拉胯。）

　　第二個作業的目的：

　　2. 正則化模型：

　　　　2.1：使用二范數對二分類模型正則化，嘗試避免過擬合。

　　　　2.2：使用隨機刪除節點的方法精簡模型，同樣是為了嘗試避免過擬合。

您將學習:在您的深度學習模型中使用正則化。

讓我們首先導入將要使用的包。

# import packages
import numpy as np
import matplotlib.pyplot as plt
from reg_utils import sigmoid, relu, plot_decision_boundary, initialize_parameters, load_2D_dataset, predict_dec
from reg_utils import compute_cost, predict, forward_propagation, backward_propagation, update_parameters
import sklearn
import sklearn.datasets
import scipy.io #scipy是構建在numpy的基礎之上的，它提供了許多的操作numpy的數組的函數。scipy.io包提供了多種功能來解決不同格式的文件的輸入和輸出。
from testCases import *  #from XXX import*是把XXX下的所有名字引入當前名稱空間。 

%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

C:\Users\1\Downloads\代碼作業\第二課第一周編程作業\assignment1\reg_utils.py:85: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(parameters['W' + str(l)].shape == layer_dims[l], layer_dims[l-1])
C:\Users\1\Downloads\代碼作業\第二課第一周編程作業\assignment1\reg_utils.py:86: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(parameters['W' + str(l)].shape == layer_dims[l], 1)

scipy基礎—io，來自：伏都哥哥，鏈接：https://blog.csdn.net/engerla/article/details/94332213

問題陳述：你剛剛被法國足球公司聘為AI專家。他們希望你能推薦法國隊守門員應該在什么位置踢球，這樣法國隊的球員就可以用頭去碰球了。

**Figure 1** : **Football field** 圖一足球場

守門員把球踢向空中，每個隊的隊員都在拼命地用頭擊球

他們提供了以下來自法國過去10場比賽的2D數據集。

train_X, train_Y, test_X, test_Y = load_2D_dataset()

每個點對應足球場上法國守門員從球場左側射門后，足球運動員用頭擊球的位置。
•如果圓點是藍色的，表示法國球員用頭擊中了球
•如果圓點是紅色的，表示對方隊員用頭擊球

你的目標：使用深度學習模型來找到守門員應該在場上踢球的位置。

數據集的分析：這個數據集有點嘈雜，但是它看起來像一條對角線，將左上角(藍色)和右下角(紅色)分開，這樣會很好。
您將首先嘗試一個非正則化的模型。然后，您將學習如何將其規范化，並決定選擇哪種模型來解決法國足球公司的問題。

1 - Non-regularized model 無正則化的模型

您將使用以下神經網絡(下面已經為您實現)。該模型可以使用:
•正則化模式——通過設置lambd輸入為一個非零值。我們使用“lambd”而不是“lambda”，因為“lambda”在Python中是一個保留關鍵字。
•dropout模式（隨機刪除節點）——通過設置keep_prob的值小於1

您將首先嘗試不進行任何正則化的模型。然后，你將實現:
•L2正則化——函數:“compute_cost_with_regularization()”和“backward_propagation_with_regularization()”
•Dropout——函數:“forward_propagation_with_dropout()”和“backward_propagation_with_dropout()”

在每個部分中，您將使用正確的輸入運行此模型，以便它調用已實現的函數。請查看下面的代碼以熟悉模型。

 1 def model(X, Y, learning_rate = 0.3, num_iterations = 30000, print_cost = True, lambd = 0, keep_prob = 1):
 2     """
 3     Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.
 4     
 5     Arguments:
 6     X -- input data, of shape (input size, number of examples)輸入的數據，維度為(2, 要訓練/測試的數量)
 7     Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (output size, number of examples)
 8     learning_rate -- learning rate of the optimization
 9     num_iterations -- number of iterations of the optimization loop
10     print_cost -- If True, print the cost every 10000 iterations
11     lambd -- regularization hyperparameter, scalar 正則化超參數，標量
12     keep_prob - probability of keeping a neuron active during drop-out, scalar.在dropout過程中保持神經元活躍的概率，標量
13     
14     Returns:
15     parameters -- parameters learned by the model. They can then be used to predict.
16     """
17         
18     grads = {}
19     costs = []                            # to keep track of the cost 
20     m = X.shape[1]                        # number of examples
21     layers_dims = [X.shape[0], 20, 3, 1]
22     
23     # Initialize parameters dictionary.
24     parameters = initialize_parameters(layers_dims)
25 
26     # Loop (gradient descent)
27 
28     for i in range(0, num_iterations):
29 
30         # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
31         if keep_prob == 1:
32             a3, cache = forward_propagation(X, parameters)
33         elif keep_prob < 1: #使用上了dropout
34             a3, cache = forward_propagation_with_dropout(X, parameters, keep_prob)
35         
36         # Cost function 計算成本
37         if lambd == 0:
38             cost = compute_cost(a3, Y)
39         else:#使用上了L2正則化
40             cost = compute_cost_with_regularization(a3, Y, parameters, lambd)
41             
42         # Backward propagation.
43         assert(lambd==0 or keep_prob==1)    # it is possible to use both L2 regularization and dropout, 可以同時使用L2正則化和dropout，
44                                             # but this assignment will only explore one at a time 這次作業一次只探討一個問題
45         if lambd == 0 and keep_prob == 1: #不使用L2正則化，也沒使用dropout，就正常的反向傳播
46             grads = backward_propagation(X, Y, cache)
47         elif lambd != 0: #只使用了L2正則化
48             grads = backward_propagation_with_regularization(X, Y, cache, lambd)
49         elif keep_prob < 1:#只使用了dropout
50             grads = backward_propagation_with_dropout(X, Y, cache, keep_prob)
51         
52         # Update parameters.參數更新
53         parameters = update_parameters(parameters, grads, learning_rate)
54         
55         # Print the loss every 10000 iterations
56         if print_cost and i % 10000 == 0:
57             print("Cost after iteration {}: {}".format(i, cost))
58         if print_cost and i % 1000 == 0:
59             costs.append(cost)
60     
61     # plot the cost
62     plt.plot(costs)
63     plt.ylabel('cost')
64     plt.xlabel('iterations (x1,000)')
65     plt.title("Learning rate =" + str(learning_rate))
66     plt.show()
67     
68     return parameters

讓我們在不進行任何正則化的情況下訓練模型，並觀察在訓練/測試集上的准確性。

parameters = model(train_X, train_Y)
print ("On the training set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

訓練精度為94.8%，試驗精度為91.5%。這是基線模型(您將觀察正則化對該模型的影響)。運行以下代碼以繪制模型的決策邊界。

plt.title("Model without regularization")
axes = plt.gca()
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

注意這里會報錯：

TypeError                                 Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py in to_rgba(c, alpha)
    165     try:
--> 166         rgba = _colors_full_map.cache[c, alpha]
    167     except (KeyError, TypeError):  # Not in cache, or unhashable.

TypeError: unhashable type: 'numpy.ndarray'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4287                 # must be acceptable as PathCollection facecolors
-> 4288                 colors = mcolors.to_rgba_array(c)
   4289             except ValueError:

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py in to_rgba_array(c, alpha)
    266     for i, cc in enumerate(c):
--> 267         result[i] = to_rgba(cc, alpha)
    268     return result

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py in to_rgba(c, alpha)
    167     except (KeyError, TypeError):  # Not in cache, or unhashable.
--> 168         rgba = _to_rgba_no_colorcycle(c, alpha)
    169         try:

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py in _to_rgba_no_colorcycle(c, alpha)
    222     if len(c) not in [3, 4]:
--> 223         raise ValueError("RGBA sequence should have length 3 or 4")
    224     if len(c) == 3 and alpha is None:

ValueError: RGBA sequence should have length 3 or 4

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-8-1c83d5b7143d> in <module>()
      3 axes.set_xlim([-0.75,0.40])
      4 axes.set_ylim([-0.75,0.65])
----> 5 plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

~\Downloads\代碼作業\第二課第一周編程作業\assignment1\reg_utils.py in plot_decision_boundary(model, X, y)
    322     plt.ylabel('x2')
    323     plt.xlabel('x1')
--> 324     plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral)
    325     plt.show()
    326 

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, hold, data, **kwargs)
   3473                          vmin=vmin, vmax=vmax, alpha=alpha,
   3474                          linewidths=linewidths, verts=verts,
-> 3475                          edgecolors=edgecolors, data=data, **kwargs)
   3476     finally:
   3477         ax._hold = washold

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1865                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1866                         RuntimeWarning, stacklevel=2)
-> 1867             return func(ax, *args, **kwargs)
   1868 
   1869         inner.__doc__ = _add_data_doc(inner.__doc__,

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4291                 raise ValueError("c of shape {} not acceptable as a color "
   4292                                  "sequence for x with size {}, y with size {}"
-> 4293                                  .format(c.shape, x.size, y.size))
   4294         else:
   4295             colors = None  # use cmap, norm after collection is created

ValueError: c of shape (1, 211) not acceptable as a color sequence for x with size 211, y with size 211

報錯

解決為修改函數里C的維度。參考本周第一個作業。

重新執行結果：

非正則化模型明顯是對訓練集過擬合，它是對有噪點的擬合!現在讓我們看看減少過擬合的兩種技術。

2 - L2 Regularization L2正則化

避免過擬合的標准方法稱為L2正則化。它包括適當修改你的成本函數，從原來的成本函數(1)到現在的函數(2)：

讓我們修改你的成本並觀察其結果。

練習：使用正則化()來執行compute_cost_with_regularization()，計算公式(2)所給出的成本。　　

注意，你必須對w[1]， w[2]和w[3]這樣做，然后把這三項加起來，再乘以

 1 # GRADED FUNCTION: compute_cost_with_regularization
 2 
 3 def compute_cost_with_regularization(A3, Y, parameters, lambd):
 4     """
 5     Implement the cost function with L2 regularization. 實現公式2的L2正則化計算成本 See formula (2) above.見上式（2）
 6     
 7     Arguments 參數:
 8     A3 -- post-activation, output of forward propagation, of shape (output size, number of examples)激活后，正向傳播的輸出結果，維度為（輸出節點數量，訓練/測試的數量）
 9     Y -- "true" labels vector, of shape (output size, number of examples) 標簽向量，與數據一一對應，維度為(輸出節點數量，訓練/測試的數量)
10     parameters -- python dictionary containing parameters of the model 包含模型學習后的參數的字典
11     
12     Returns:
13     cost - value of the regularized loss function (formula (2))使用公式2計算出來的正則化損失的值
14     """
15     m = Y.shape[1]
16     W1 = parameters["W1"]
17     W2 = parameters["W2"]
18     W3 = parameters["W3"]
19     
20     cross_entropy_cost = compute_cost(A3, Y) # This gives you the cross-entropy part of the cost 這就給出了代價的交叉熵部分
21     
22     ### START CODE HERE ### (approx. 1 line)
23     L2_regularization_cost = (1 / m * lambd / 2 )* (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
24     ### END CODER HERE ###
25     
26     cost = cross_entropy_cost + L2_regularization_cost
27     
28     return cost

# GRADED FUNCTION: compute_cost_with_regularization

A3, Y_assess, parameters = compute_cost_with_regularization_test_case()

print("cost = " + str(compute_cost_with_regularization(A3, Y_assess, parameters, lambd = 0.1)))

結果：

當然，因為更改了成本，所以也必須更改反向傳播!所有的梯度都要根據這個新成本計算。

練習：實現向后傳播所需的更改，以考慮正則化。這些更改只涉及dW1、dW2和dW3。對於每一個，你必須加上正則項的梯度

 1 # GRADED FUNCTION: backward_propagation_with_regularization
 2 
 3 def backward_propagation_with_regularization(X, Y, cache, lambd):
 4     """
 5     Implements the backward propagation of our baseline model to which we added an L2 regularization.
 6     
 7     Arguments:
 8     X -- input dataset, of shape (input size, number of examples)輸入數據集，維度為（輸入節點數量，數據集里面的數量）
 9     Y -- "true" labels vector, of shape (output size, number of examples)標簽，維度為（輸出節點數量，數據集里面的數量）
10     cache -- cache output from forward_propagation()來自forward_propagation（）的cache輸出
11     lambd -- regularization hyperparameter, scalarregularization超參數，實數
12     
13     Returns:
14     gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables 一個包含了每個參數、激活值和預激活值變量的梯度的字典
15     """
16     
17     m = X.shape[1]
18     (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
19     
20     dZ3 = A3 - Y
21     
22     ### START CODE HERE ### (approx. 1 line)
23     dW3 = 1./m * np.dot(dZ3, A2.T) + ((lambd * W3) / m)
24     ### END CODE HERE ###
25     db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
26     
27     dA2 = np.dot(W3.T, dZ3)
28     dZ2 = np.multiply(dA2, np.int64(A2 > 0))
29     ### START CODE HERE ### (approx. 1 line)
30     dW2 = 1./m * np.dot(dZ2, A1.T) + ((lambd * W2) / m)
31     ### END CODE HERE ###
32     db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
33     
34     dA1 = np.dot(W2.T, dZ2)
35     dZ1 = np.multiply(dA1, np.int64(A1 > 0))
36     ### START CODE HERE ### (approx. 1 line)
37     dW1 = 1./m * np.dot(dZ1, X.T) + ((lambd * W1) / m)
38     ### END CODE HERE ###
39     db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
40     
41     gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
42                  "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
43                  "dZ1": dZ1, "dW1": dW1, "db1": db1}
44     
45     return gradients

# GRADED FUNCTION: backward_propagation_with_regularization

X_assess, Y_assess, cache = backward_propagation_with_regularization_test_case()

grads = backward_propagation_with_regularization(X_assess, Y_assess, cache, lambd = 0.7)
print ("dW1 = "+ str(grads["dW1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("dW3 = "+ str(grads["dW3"]))

結果：

現在讓我們運行L2正則化的模型。model()函數將調用:

•compute_cost_with_regularization而不是compute_cost
•backward_propagation_with_regularization而不是backward_propagation

parameters = model(train_X, train_Y, lambd = 0.7)
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

On the train set:

Accuracy: 0.9383886255924171

On the test set:

Accuracy: 0.93

恭喜，測試集的准確率提高到了93%。你救了法國足球隊!

你不再過度擬合訓練數據了。讓我們畫出決策邊界。

plt.title("Model with L2-regularization")
axes = plt.gca() #Get Current Axes
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

　　python matplotlib.pyplot.gca() 函數的作用（獲得當前的Axes對象【Get Current Axes】），參考鏈接：https://blog.csdn.net/Dontla/article/details/98327176

結果：

觀察：
•您可以使用開發集對其進行優化，其中的值是一個超參數。
•L2正則化使您的決策邊界更加平滑。如果擬合太大，也可能會“過度平滑”，導致模型具有高偏差。

l2正規化的實際作用是什么?

l2正則化的前提是，權值較小的模型比權值較大的模型更簡單。因此，通過懲罰成本函數中權重的平方值，可以將所有權重都變成更小的值。它變得太昂貴的成本有大的重量!這將產生一個更平滑的模型，在這個模型中，輸出隨着輸入的變化而變化得更慢。

（L2正則化依賴於較小權重的模型比具有較大權重的模型更簡單這樣的假設，因此，通過削減成本函數中權重的平方值，可以將所有權重值逐漸改變到到較小的值。權值數值高的話會有更平滑的模型，其中輸入變化時輸出變化更慢，但是你需要花費更多的時間。）

* *你應該記得* *

——L2-regularization的含義:

成本計算：——正則化的計算項需要添加到成本函數中

反向傳播功能：——有額外的術語在梯度對權重矩陣,（在權重矩陣方面，梯度計算時也要依據正則化來做出相應的計算）

最終權重變小(“重量衰減”)：——重量推到更小的值。（權重被逐漸改變到較小的值。）

3 - Dropout 隨機刪除節點

最后，dropout是一種廣泛使用的正則化技術，專門用於深度學習。它在每次迭代中隨機關閉一些神經元。看看這兩個視頻，看看這意味着什么!

Figure 2 : Drop-out on the second hidden layer. 圖2 第二層啟用隨機節點刪除。

在每次迭代，你關閉(設置為零)一層的每個神經元，概率為1 - keep_prob，我們在這保持它的概率為keep_prob(這里是50%)。在迭代的前向和后向傳播中，缺失的神經元對訓練都沒有貢獻。

Figure 3 : Drop-out on the first and third hidden layers. 圖3 在第一層和第三層啟用隨機刪除.

第一層:我們平均關閉40%的神經元。第三層:我們平均關閉20%的神經元。

當你關閉一些神經元時，你實際上修改了你的模型。drop-out背后的想法是，在每次迭代中，你訓練一個不同的模型，它只使用你神經元的一個子集。有了dropout，你的神經元因此變得對另一個特定神經元的激活不那么敏感，因為另一個神經元可能在任何時候關閉。

3.1 - Forward propagation with dropout 帶有dropout的正向傳播

練習：使用dropout實現轉發。您使用了一個3層的神經網絡，並將dropout添加到第一層和第二層隱藏層。我們不會將dropout應用到輸入層或輸出層。

說明：你想關閉第一層和第二層的一些神經元。要做到這一點，你需要執行4個步驟：

1.在講座中，我們討論了使用np.random.rand()創建一個與a^[1]具有相同形狀的變量d^[1]，以隨機獲取0到1之間的數字。這里，您將使用向量化實現，因此創建一個隨機矩陣D^[1] =[D ^[1]₍₁₎ D^[1]₍₂₎…與A^[1]維數相同的。

2.通過適當地對D^[1]中的值進行閾值設定，將D^[1]的每個條目的概率設為0 (1-keep_prob)或概率設為1 (keep_prob)。提示:要將矩陣X的所有項設為0(如果項小於0.5)或1(如果項大於0.5)，您可以這樣做：X = (X < 0.5)。注意，0和1分別等同於False和True。

3.將A[1]設置為A^[1]∗D^[1]。(你正在關閉一些神經元)。你可以把d^[1]看作一個掩碼，當它與另一個矩陣相乘時，它會關閉一些值。

4.A^[1]除以keep_prob。通過這樣做，您可以確保成本的結果仍然具有與沒有dropout相同的期望值。(這種技術也被稱為反向dropout。)

 1 # GRADED FUNCTION: forward_propagation_with_dropout
 2 
 3 def forward_propagation_with_dropout(X, parameters, keep_prob = 0.5):
 4     """
 5     Implements the forward propagation: LINEAR -> RELU + DROPOUT -> LINEAR -> RELU + DROPOUT -> LINEAR -> SIGMOID.
 6     實現具有隨機舍棄節點的前向傳播。
 7     Arguments:
 8     X -- input dataset, of shape (2, number of examples)
 9     parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
10                     W1 -- weight matrix of shape (20, 2)
11                     b1 -- bias vector of shape (20, 1) 偏向量
12                     W2 -- weight matrix of shape (3, 20)
13                     b2 -- bias vector of shape (3, 1)
14                     W3 -- weight matrix of shape (1, 3)
15                     b3 -- bias vector of shape (1, 1)
16     keep_prob - probability of keeping a neuron active during drop-out, scalar 隨機刪除的概率，實數
17     
18     Returns:
19     A3 -- last activation value, output of the forward propagation, of shape (1,1) 最后的激活值，維度為（1,1），正向傳播的輸出
20     cache -- tuple, information stored for computing the backward propagation  存儲了一些用於計算反向傳播的數值的元組
21     """
22     
23     np.random.seed(1)
24     
25     # retrieve parameters
26     W1 = parameters["W1"]
27     b1 = parameters["b1"]
28     W2 = parameters["W2"]
29     b2 = parameters["b2"]
30     W3 = parameters["W3"]
31     b3 = parameters["b3"]
32     
33     # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
34     Z1 = np.dot(W1, X) + b1
35     A1 = relu(Z1)
36     ### START CODE HERE ### (approx. 4 lines)         # Steps 1-4 below correspond to the Steps 1-4 described above. 下面的步驟1-4對應於上述的步驟1-4。
37     D1 = np.random.rand(A1.shape[0], A1.shape[1])     # Step 1: initialize matrix D1 = np.random.rand(..., ...) 初始化矩陣D1 = np.random.rand(..., ...)
38     D1 = D1<keep_prob                                 # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)將D1的值轉換為0或1（使用keep_prob作為閾值）
39     A1 = A1 * D1                                      # Step 3: shut down some neurons of A1 的一些節點（將它的值變為0或False）
40     A1 = A1 / keep_prob                               # Step 4: scale the value of neurons that haven't been shut down 縮放未舍棄的節點(不為0)的值
41     ### END CODE HERE ###
42     Z2 = np.dot(W2, A1) + b2
43     A2 = relu(Z2)
44     ### START CODE HERE ### (approx. 4 lines)
45     D2 = np.random.rand(A2.shape[0], A2.shape[1])      # Step 1: initialize matrix D2 = np.random.rand(..., ...) 初始化矩陣D2 = np.random.rand(..., ...)
46     D2 = D2 < keep_prob                                # Step 2: convert entries of D2 to 0 or 1 (using keep_prob as the threshold) 將D2的值轉換為0或1（使用keep_prob作為閾值）
47     A2 = A2 * D2                                       # Step 3: shut down some neurons of A2 舍棄A1的一些節點（將它的值變為0或False） 舍棄A1的一些節點（將它的值變為0或False）
48     A2 = A2 / keep_prob                                # Step 4: scale the value of neurons that haven't been shut down 縮放未舍棄的節點(不為0)的值
49     ### END CODE HERE ###
50     Z3 = np.dot(W3, A2) + b3
51     A3 = sigmoid(Z3)
52     
53     cache = (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3)
54     
55     return A3, cache

# GRADED FUNCTION: forward_propagation_with_dropout

X_assess, parameters = forward_propagation_with_dropout_test_case()

A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob = 0.7)
print ("A3 = " + str(A3))

3.2 - Backward propagation with dropout 使用dropout的反向傳播

練習：使用dropout實現向后傳播。與前面一樣，您正在訓練一個3層網絡。使用存儲在緩存中的掩碼D^[1]和D^[2]，在第一和第二隱藏層添加dropout。

說明:帶dropout的反向傳播實際上很容易。您將必須執行兩個步驟：

1.在前向傳播的過程中，通過對A1施加一個掩碼D^[1]，你已經關閉了一些神經元。在反向傳播中，你必須通過對dA1重新應用相同的掩碼D^[1]來關閉相同的神經元。

2.在前向傳播期間，您將A1除以keep_prob。在反向傳播中，您將不得不再次將dA1除以keep_prob(微積分解釋是，如果一個A^[1]被keep_prob縮放，那么它的導數dA^[1]也被相同的keep_prob縮放)。

# GRADED FUNCTION: backward_propagation_with_dropout

def backward_propagation_with_dropout(X, Y, cache, keep_prob):
    """
    Implements the backward propagation of our baseline model to which we added dropout.
    
    Arguments:
    X -- input dataset, of shape (2, number of examples)
    Y -- "true" labels vector, of shape (output size, number of examples)
    cache -- cache output from forward_propagation_with_dropout() 來自forward_propagation_with_dropout（）的cache輸出
    keep_prob - probability of keeping a neuron active during drop-out, scalar 隨機刪除的概率，實數
    
    Returns:
    gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables 一個關於每個參數、激活值和預激活變量的梯度值的字典
    """
    
    m = X.shape[1]
    (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3) = cache
    
    dZ3 = A3 - Y
    dW3 = 1./m * np.dot(dZ3, A2.T)
    db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
    dA2 = np.dot(W3.T, dZ3)
    ### START CODE HERE ### (≈ 2 lines of code)
    dA2 = dA2 * D2              # Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation使用正向傳播期間相同的節點，舍棄那些關閉的節點（因為任何數乘以0或者False都為0或者False）
    dA2 = dA2 / keep_prob              # Step 2: Scale the value of neurons that haven't been shut down 縮放未舍棄的節點(不為0)的值
    ### END CODE HERE ###
    dZ2 = np.multiply(dA2, np.int64(A2 > 0))
    dW2 = 1./m * np.dot(dZ2, A1.T)
    db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
    
    dA1 = np.dot(W2.T, dZ2)
    ### START CODE HERE ### (≈ 2 lines of code)
    dA1 = dA1 * D1              # Step 1: Apply mask D1 to shut down the same neurons as during the forward propagation 使用正向傳播期間相同的節點，舍棄那些關閉的節點（因為任何數乘以0或者False都為0或者False）
    dA1 = dA1 / keep_prob              # Step 2: Scale the value of neurons that haven't been shut down 縮放未舍棄的節點(不為0)的值
    ### END CODE HERE ###
    dZ1 = np.multiply(dA1, np.int64(A1 > 0))
    dW1 = 1./m * np.dot(dZ1, X.T)
    db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
    
    gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
                 "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
                 "dZ1": dZ1, "dW1": dW1, "db1": db1}
    
    return gradients

X_assess, Y_assess, cache = backward_propagation_with_dropout_test_case()

gradients = backward_propagation_with_dropout(X_assess, Y_assess, cache, keep_prob = 0.8)

print ("dA1 = " + str(gradients["dA1"]))
print ("dA2 = " + str(gradients["dA2"]))

結果：

現在讓我們運行帶有dropout (keep_prob = 0.86)的模型。這意味着在每次迭代中，以24%的概率關閉第1層和第2層的每個神經元。函數model()現在將調用：
•forward_propagation_with_dropout而不是forward_propagation。
•backward_propagation_with_dropout而不是backward_propagation。

parameters = model(train_X, train_Y, keep_prob = 0.86, learning_rate = 0.3)

print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

結果：

dropout干的很好!測試准確度再次提高(到95%)!你的模型沒有過度擬合訓練集，而且在測試集上做得很好。法國足球隊將永遠感謝你!

運行下面的代碼來繪制決策邊界。

plt.title("Model with dropout")
axes = plt.gca()
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

注意：
•在使用dropout時，一個常見的錯誤是在訓練和測試中都使用它。您應該只在訓練中使用dropout(隨機刪除節點)。
•深度學習框架，如tensorflow, PaddlePaddle, keras或caffe都帶有dropout層實現。不要緊張——你很快就會學到其中一些框架。
關於dropout你應該記住的是:**

- dropout是一種正則化技術。

-你只在訓練期間使用dropout。在測試期間不要使用dropout(隨機刪除節點)。

-在正向傳播和反向傳播期間應用dropout。

-在訓練期間，用keep_prob對每個dropout層進行划分，以保持激活的期望值相同。例如，如果keep_prob是0.5，那么我們將平均關閉一半節點，因此輸出將按0.5進行縮放，因為只有剩下的一半對解決方案有貢獻。除以0.5等於乘以2。因此，輸出現在具有相同的期望值。您可以檢查，即使在keep_prob的值不是0.5時，這個方法也可以工作。

4 - Conclusions 結論

以下是我們三個模型的結果:

注意，正則化會影響訓練集的性能!這是因為它限制了網絡過度適應訓練集的能力。但由於它最終提供了更好的測試准確性，它有助於您的系統。

恭喜你完成了這項任務!以及法國足球的革命。:-)

**我們希望你從這個筆記本中記住的東西**:

-正則化將幫助你減少過擬合。

-正規化將使你的權重值降低。

- L2正則化和Dropout正則化是兩種非常有效的正則化技術。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

吳恩達深度學習 第二課第一周編程作業_regularization（正則化）