梯度下降法

本文轉載自查看原文 2019-11-20 20:14 288 算法/ 機器學習

簡介

梯度下降法是迭代法的一種，可以用於求解最小二乘問題（線性和非線性都可以），在求解機器學習算法的模型參數，梯度下降是最常采用的方法之一，在求解損失函數的最小值時，可以通過梯度下降法來一步步的迭代求解

不是一個機器學習算法
是一種基於搜索的最優化方法
最小化損失函數
最大化一個效用函數（梯度上升法）

模型

$J=\theta ^{2}+b$

　　定義了一個損失函數以后，參數 $\theta $ 對應的損失函數 $J$ 的值對應的示例圖，需要找到使得損失函數值 $J$ 取得最小值對應的 $\theta $

首先隨機取一個 $\theta $，對 $\theta $求導乘 $\eta $，得到一個導數gradient
將之前的 $\theta $ 存為last_theta
將 $\theta $減去$\eta $*gradient得到的值存入 $\theta $
將 $\theta $與last_theta分別代入公式后得到兩個函數值相減，如果小於指定的一個極小值，則說明已找到了最小的 $\theta $，否則重復第1個步聚，對 $\theta $ 求導，依次完成，直到差值小於極小值

$\eta $ 超參數的作用

$\eta $ 稱為學習率也稱為步長
$\eta $ 的取值影響獲得最優解的速度
$\eta $ 取值不合適，可能得不到最優解
$\eta $ 是梯度下降的一個超參數

η太小，會減慢收斂學習速度

η太大，會導致不收斂

局部最優與全局最優

最優化問題

　　就是在復雜環境中遇到的許多可能的決策中，挑選“最好”的決策的科學機器學習中選擇最小的參數滿足分類與預測的要求

　　全局最優：若一項決策和所有解決該問題的決策相比是最優的，對目標函數去最大值還是最小值，損失函數只有一個最優解

　　局部最優：不要求在所有決策中是最好的

解決方案

多次運行，隨機化初始點
梯度下降法的初始點也是一個超參數

目標

　　使 $\sum_{i=1}^{m}(y^{(i)}-\widehat{y}^{(i)})^{2}$ 盡可能小

代碼實現

python代碼實現

mport numpy as np
import matplotlib.pyplot as plt 
plot_x = np.linspace(-1.0,6.0,141)
plot_y = (plot_x-2.5)**2+1
plt.plot(plot_x,plot_y,c='b')
plt.show() 
#定一個極小值
epsilon = 1e-8
eta = 0.1 
def J(thata):
    return (thata-2.5)**2+1.0 
def DJ(thata):
    return 2*(thata-2.5)
thata = 0.0
while True:
    g = DJ(thata)
    last_thata = thata
    thata = thata-g
    if(abs(J(thata)-J(last_thata))<epsilon):
        #注意這里不能小於0，如果兩個損失函數的值相減小於一個極值，說明已找到
        break;
print(thata)
print(J(thata))#最后一次的值

查看學習率

theta = 0.0
theta_history = [theta]
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta * gradient
    theta_history.append(theta)

    if(abs(J(theta) - J(last_theta)) < epsilon):
        break

plt.plot(plot_x, J(plot_x))
plt.plot(np.array(theta_history), J(np.array(theta_history)), color="r", marker='+')
plt.show()

函數封裝

def gradient_descent(inital_theta,eta,epslion=1e-8):
    theta = inital_theta
    theta_history.append(theta)
    while True:
        g = DJ(theta)
        last_thata = theta
        theta = theta - eta * g
        theta_history.append(theta)
        if (abs(J(theta) - J(last_thata)) < epsilon):
            # 注意這里不能小於0，如果兩個損失函數的值相減小於一個極值，說明已找到
            break;
def plot_theta_history():
    plt.plot(plot_x, J(plot_x), c='b')  # 將x數據傳入J函數取得y的值
    plt.plot(np.array(theta_history), J(np.array(theta_history)), c='r', marker='+')
    plt.show() 
gradient_descent(0.0,eta)
plot_theta_history()

調整學習參數

eta = 0.001
theta_history = []
gradient_descent(0, eta)
plot_theta_history()


eta = 0.8
theta_history = []
gradient_descent(0, eta)
plot_theta_history()


eta = 1.1
theta_history = []
gradient_descent(0, eta)

迭代次數的調整

def gradient_descent(initial_theta, eta, n_iters = 1e4, epsilon=1e-8):

    theta = initial_theta
    i_iter = 0
    theta_history.append(initial_theta)

    while i_iter < n_iters:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient
        theta_history.append(theta)

        if(abs(J(theta) - J(last_theta)) < epsilon):
            break
        i_iter += 1
    return

多元線性回歸中的梯度下降法

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 隨機梯度下降法 Python 梯度下降法梯度下降法原理及小結梯度下降法 matlab實現梯度下降法的優缺點深度學習之梯度下降法通俗易懂講解梯度下降法 (轉)邏輯回歸與梯度下降法回歸與梯度下降法及實現原理動量梯度下降法(gradient descent with momentum)