吳恩達機器學習¶

編程作業1：單變量線性回歸

該文章的實現步驟基本上是按照Cowry5的這篇文章：https://blog.csdn.net/Cowry5/article/details/83302646 中的線性回歸章節來實現的，其中有略微改動。

本文代碼：https://github.com/asddongmen/Machine-Learning-Andrew-Ng--program_in_python

編程語言：python 3.6

環境：jupyter notebook

 
              import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

path = 'ex1data1.txt'
# names添加列名，header用指定的行來作為標題，若原來無標題則設為none
data = pd.read_csv(path,names=['Population','Profit'])
data.head() 
             

	Population	Profit
0	6.1101	17.5920
1	5.5277	9.1302
2	8.5186	13.6620
3	7.0032	11.8540
4	5.8598	6.8233

data.describe() # 輸出一些數據的統計量

	Population	Profit
count	97.000000	97.000000
mean	8.159800	5.839135
std	3.869884	5.510262
min	5.026900	-2.680700
25%	5.707700	1.986900
50%	6.589400	4.562300
75%	8.578100	7.046700
max	22.203000	24.147000

在開始任何任務之前，通過可視化來理解數據通常是有用的。對於這個數據集，你可以使用散點圖來可視化數據，因為它只有兩個屬性（利潤，人口）。（你在生活中遇到的許多其他問題都是多維度的，不能在二維圖上畫出來。）

 
              data.plot(kind='scatter', x='Population', y='Profit', figsize=(8,5))
plt.show(1)

現在讓我們使用梯度下降來實現線性回歸，以最小化cost function。首先，我們創建一個以 $θ$

其中，Hypothesis為：

當然，在該題情況中Hypothesis為：

我們把

x_{0}

# 計算代價函數J（θ）
# 傳入的X，y，t_theta都是列向量
# X 中的每行為x1,x2,x3.....
def compute_cost(X,y,theta):
    #print(t_theta.shape)
    inner = np.power(((X.dot(theta.T))-y),2) # 后面的2表示2次冪
    return sum(inner)/(2*len(X))
# 在訓練集中插入一列1，方便計算
data.insert(0, 'Ones', 1)
# set X(training set), y(target variable)
# 設置訓練集X，和目標變量y的值
cols = data.shape[1] # 獲取列數
X = data.iloc[:,0:cols-1] # 輸入向量X為前cols-1列
y = data.iloc[:,cols-1:cols] # 目標變量y為最后一列
#print(type(X))
#print(y)
X = np.array(X.values)
y = np.array(y.values)
theta = np.array([0,0]).reshape(1,2)
print(theta.shape)
print(X.shape)
print(y.shape)

(1, 2)
(97, 2)
(97, 1)

parameters = int(theta.flatten().shape[0]) # 參數的數量
def gradientDescent(X, y, theta, alpha, epoch=1000):
    '''return theta, cost'''
    temp = np.array(np.zeros(theta.shape)) # 初始化一個theta臨時矩陣
    parameters = int(theta.flatten().shape[0]) # 參數theta的數量
    cost = np.zeros(epoch) # 初始化一個ndarray，包含每次迭代后的cost
    m = X.shape[0] # 樣本數量m
    
    for i in range(epoch):
        # 利用向量化同步計算theta值
        # 注意theta是一個行向量
        temp = theta - (alpha/m) * (X.dot(theta.T)- y).T.dot(X) # 得出一個theta行向量
        theta = temp
        cost[i] = compute_cost(X, y, theta) # 這個函數中，theta是變量，X，y是已知量
    return theta,cost # 迭代結束之后返回theta和cost值

初始化學習速率和迭代次數：

alpha = 0.01
epoch = 2000

運行梯度下降算法，得出theta和cost：

final_theta, cost = gradientDescent(X, y, theta, alpha, epoch)

接下來我們可以使用計算出來的final_theta來計算訓練模型的代價函數（計算誤差）：

 
              final_cost = compute_cost(X, y, final_theta)
print(final_cost)
 
#[4.47802761]
population = np.linspace(data.Population.min(), data.Population.max(), 97)
population.shape 
             

(97,)

然后根據得出的參數theta，代入假設函數，繪制假設函數和訓練數據的圖像，直觀的觀測擬合程度：

population = np.linspace(data.Population.min(), data.Population.max(), 100) # 橫坐標
profit = final_theta[0,0] + (final_theta[0,1] * population) # 縱坐標，利潤

fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(population, profit, 'r', label='Prediction')
ax.scatter(data['Population'], data['Profit'], label='Training data')
ax.legend(loc=4) # 4表示標簽在右下角
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Prediction Profit by. Population Size')
plt.show()

由於梯度下降過程中每一次迭代都會得到一個cost值，下面我們根據cost的值來繪制圖像。我們通常使用繪制cost圖像的方式來觀測梯度下降算法是否正常的運行，若是算法運行正常，該圖像會一直下降。

fig,ax = plt.subplots(figsize=(8, 6))
ax.plot(np.arange(epoch), cost, 'r')
ax.set_xlabel('Iteration')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Traning Epoch')
plt.show()

可以由圖像觀察得出梯度下降算法運行正確，cost值是不斷收斂的。
到此，第一次編程作業結束。

這次編程作業花了我很長時間才完成，歸根到底是不熟悉矩陣運算造成的。理解和推導矩陣的運算花費了我一個小時的時間。
做完編程作業之后，我對線性回歸和梯度下降算法有了更深入的了解，但是仍舊有很多疑惑，比如為什么這個算法是有用的，梯度下降算法是怎么得出的等等。
我打算先把吳恩達的課程看完，作業做完，對整個machine learning有一個總體的了解之后，再認真學習一遍林軒田的機器學習基石與機器學習技法，從整個理論上去了解機器學習。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 吳恩達機器學習作業1- 線性回歸作業（python實現）吳恩達機器學習筆記 —— 2 單變量線性回歸 Coursera-AndrewNg(吳恩達)機器學習筆記——第二周編程作業（線性回歸）吳恩達機器學習作業2- 邏輯回歸與正則化作業（python實現） Coursera-AndrewNg(吳恩達)機器學習筆記——第三周編程作業（邏輯回歸）吳恩達機器學習筆記 —— 7 Logistic回歸 coursera吳恩達機器學習編程作業原文件及我的作業吳恩達機器學習的ppt以及作業編程練習題答案（別人總結的） Coursera-AndrewNg(吳恩達)機器學習筆記——第四周編程作業（多分類與神經網絡）吳恩達機器學習筆記17-邏輯回歸的代價函數