15、優化算法之Mini-batch 梯度下降法

本文轉載自查看原文 2021-08-24 15:45 195 深度學習

再進行Mini-batch 梯度下降法學習之前，我們首先對梯度下降法進行理解

一、梯度下降法（Gradient Descent）

　　優化思想：用當前位置的負梯度方向作為搜索方向，亦即為當前位置下降最快的方向，也稱“最速下降法”。越接近目標值時，步長越小，下降越慢。

　　首先來看看梯度下降的一個直觀的解釋。比如我們在一座大山上的某處位置，由於我們不知道怎么下山，於是決定走一步算一步，也就是在每走到一個位置的時候，求解當前位置的梯度，沿着梯度的負方向，也就是當前最陡峭的位置向下走一步，然后繼續求解當前位置梯度，向這一步所在位置沿着最陡峭最易下山的位置走一步。這樣一步步的走下去，一直走到覺得我們已經到了山腳。當然這樣走下去，有可能我們不能走到山腳，而是到了某一個局部的山峰低處。

　　從上面的解釋可以看出，梯度下降不一定能夠找到全局的最優解，有可能是一個局部最優解。當然，如果損失函數是凸函數，梯度下降法得到的解就一定是全局最優解。

接下來我們了解一下梯度下降法的相關概念

二、批量梯度下降法（Batch Gradient Descent，BGD）

　　在更新參數時，BGD根據batch中的所有樣本對參數進行更新。

三、隨機梯度下降法（Stochastic Gradient Descent，SGD）

　　隨機梯度下降法，其實和批量梯度下降法原理類似，區別在與求梯度時沒有用所有的m個樣本的數據，而是僅僅選取一個樣本j來求梯度。對應的更新公式是：

　　隨機梯度下降法，和批量梯度下降法是兩個極端，一個采用所有數據來梯度下降，一個用一個樣本來梯度下降。自然各自的優缺點都非常突出。對於訓練速度來說，隨機梯度下降法由於每次僅僅采用一個樣本來迭代，訓練速度很快，而批量梯度下降法在樣本量很大的時候，訓練速度不能讓人滿意。對於准確度來說，隨機梯度下降法用於僅僅用一個樣本決定梯度方向，導致解很有可能不是最優。對於收斂速度來說，由於隨機梯度下降法一次迭代一個樣本，導致迭代方向變化很大，不能很快的收斂到局部最優解。

四、小批量梯度下降法（Mini-batch Gradient Descent）——>重點

　　小批量梯度下降法是批量梯度下降法和隨機梯度下降法的折衷，也就是對於m個樣本，我們采用x個樣子來迭代，1<x<m。一般可以取x=10，當然根據樣本的數據，可以調整這個x的值。對應的更新公式是：

五、三種方法代碼演示

（一）准備工作

　　1、導入相關的包

import numpy as np
import os
#畫圖
%matplotlib inline
import matplotlib.pyplot as plt

　　2、保存圖像

#保存圖像
PROJECT_ROOT_DIR = "."
MODEL_ID = "linear_models"

　　3、生成隨機種子

np.random.seed(42)

　　4、定義一個保存圖像的函數

#定義一個保存圖像的函數
def save_fig(fig_id,tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR,"images",MODEL_ID,fig_id + ".png")#指定保存圖像的路徑
    print("Saving figure",fig_id)#提示函數，正在保存圖片
    plt.savefig(path,format="png",dpi=300)#保存圖片（需要指定保存路徑，保存格式，清晰度）

　　5、過濾掉討厭的警告信息

#過濾掉討厭的警告信息
import warnings
warnings.filterwarnings(action="ignore",message="internal gelsd")

　　6、定義變量

#定義變量
import numpy as np
 
x = 2 * np.random.rand(100,1) #生成訓練數據（特征部分）
y = 4 + 3 * x + np.random.randn(100,1) #生成訓練數據（標簽部分）

　　7、畫出圖像

#畫出圖像
plt.plot(x,y,"b.") #畫圖
plt.xlabel("$x_1$",fontsize=18)
plt.ylabel("$y$",fontsize=18,rotation=0)
plt.axis([0,2,0,15])
save_fig("generated_data_plot") #保存圖片
plt.show()

　　8、添加新特征

#添加新特征
x_b = np.c_[np.ones((100,1)),x]

　　9、創建測試數據

#創建測試數據
x_new = np.array([[0],[2]])
x_new_b = np.c_[np.ones((2,1)),x_new]
 
#從sklearn包里導入線性回歸模型
from sklearn.linear_model import LinearRegression
line_reg = LinearRegression() #創建線性回歸對象
line_reg.fit(x,y) #擬合訓練數據
line_reg.intercept_,line_reg.coef_  #輸出截距，斜率

(array([4.21509616]), array([[2.77011339]]))

　　10、對測試集進行預測

#對測試集進行預測
line_reg.predict(x_new)

（二）用批量梯度下降求解線性回歸

#用批量梯度下降求解線性回歸
eta = 0.1
n_iterations = 100   #迭代次數
m =100
theta = np.random.randn(2,1)
 
for iteration in range(n_iterations):
    # h theta (x(i)) = x_b.dot(theta)
    
    gradients = 2/m * x_b.T.dot(x_b.dot(theta) - y )
    theta = theta - eta * gradients
m = len(x_b)
theta_path_bgd = []
def plot_gradient_descent(theta,eta,theta_path = None):
    m = len(x_b)
    plt.plot(x,y,"b.")
    n_iterations = 1000
    for iteration in range(n_iterations):
        if iteration <10:
            y_predict = x_new_b.dot(theta)
            style = "b-"
            plt.plot(x_new,y_predict,style)
        gradients = 2/m * x_b.T.dot(x_b.dot(theta) - y)
        theta = theta - eta *gradients
        if theta_path is not None:
            theta_path.append(theta)
    plt.xlabel("$x_1$",fontsize=18)
    plt.axis([0,2,0,15]) #坐標，橫坐標0-2，縱坐標0-15
    plt.title(r"$eta = {}$".format(eta),fontsize=16)
np.random.seed(42)
theta = np.random.randn(2,1) #random initialization
 
plt.figure(figsize=(10,4))
plt.subplot(131);plot_gradient_descent(theta,eta=0.02)
plt.ylabel("$y$",rotation=0,fontsize=18)
plt.subplot(132);plot_gradient_descent(theta,eta=0.1,theta_path=theta_path_bgd)
plt.subplot(133);plot_gradient_descent(theta,eta=0.5)
 
save_fig("gradient_descent_plot")
plt.show()

（三）用隨機梯度下降求解線性回歸

#用隨機梯度下降求解線性回歸
theta_path_sgd = []
m = len(x_b)
np.random.seed(42)
n_epochs = 50
 
theta = np.random.randn(2,1) #隨機初始化
 
for epoch in range(n_epochs):
    for i in range(m):
        if epoch == 0 and i < 20:
            y_predict = x_new_b.dot(theta)
            style = "b-"
            plt.plot(x_new,y_predict,style)
        random_index = np.random.randint(m)
        xi = x_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2 * xi.T.dot(xi.dot(theta)-yi)
        eta = 0.1
        theta = theta - eta * gradients
        theta_path_sgd.append(theta)
    
plt.plot(x,y,"b.")
plt.xlabel("$x_1$",fontsize=18)
plt.ylabel("$y$",fontsize=18,rotation=0)
plt.axis([0,2,0,15])
save_fig("sgd_plot") #保存圖片
plt.show()

from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=50,tol=np.infty,penalty=None,eta0=0.1,random_state=42)
sgd_reg.fit(x,y.ravel())

SGDRegressor(eta0=0.1, max_iter=50, penalty=None, random_state=42, tol=inf)

#查看截取，斜率
sgd_reg.intercept_,sgd_reg.coef_

運行結果：(array([4.25857953]), array([2.95762926]))

（四）用小批量梯度下降求解線性回歸

#用小批量梯度下降求解線性回歸
theta_path_mgd = []
 
n_iterations = 50
minibatch_size = 20
 
np.random.seed(42)
theta = np.random.randn(2,1) #random intialization
 
for epoch in range(n_iterations):
    shuffled_indices = np.random.permutation(m)
    x_b_shuffled = x_b[shuffled_indices]
    y_shuffled = y[shuffled_indices]
    for i in range(0,m,minibatch_size):
        xi = x_b_shuffled[i:i+minibatch_size]
        yi = y_shuffled[i:i+minibatch_size]
        gradients = 2/minibatch_size * xi.T.dot(xi.dot(theta) - yi)
        eta = 0.1
        theta = theta - eta * gradients
        theta_path_mgd.append(theta)
theta_path_bgd = np.array(theta_path_bgd)
theta_path_sgd = np.array(theta_path_sgd)
theta_path_mgd = np.array(theta_path_mgd)
 
plt.figure(figsize = (7,4))
plt.plot(theta_path_sgd[:,0],theta_path_sgd[:,1],"r-s",linewidth=1,label="Stochastic")
plt.plot(theta_path_mgd[:,0],theta_path_mgd[:,1],"g-+",linewidth=2,label="Mini-Batch")
plt.plot(theta_path_bgd[:,0],theta_path_bgd[:,1],"b-o",linewidth=3,label="Batch")
plt.legend(loc="upper left",fontsize=16)
plt.xlabel(r"$theta_0$",fontsize=20)
plt.ylabel(r"$theta_1$",fontsize=20,rotation = 0)
plt.axis([2.5,4.5,2.3,3.9])
save_fig("gradients_descent_paths_plot")
plt.show()

參考：https://blog.csdn.net/weixin_36365168/article/details/112484422

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 梯度下降法（BGD & SGD & Mini-batch SGD） 2-2 理解 mini-batch 梯度下降法梯度下降算法對比（批量下降/隨機下降/mini-batch）改善深層神經網絡的優化算法：mini-batch梯度下降、指數加權平均、動量梯度下降、RMSprop、Adam優化、學習率衰減隨機梯度下降、mini-batch梯度下降以及batch梯度下降【深度學習】：梯度下降，隨機梯度下降（SGD），和mini-batch梯度下降優化-最小化損失函數的三種主要方法：梯度下降(BGD)、隨機梯度下降(SGD)、mini-batch SGD Kmeans算法的經典優化——mini-batch和Kmeans++ Mini-Batch 、Momentum、Adam算法的實現【DeepLearning】優化算法：SGD、GD、mini-batch GD、Moment、RMSprob、Adam