python實現多變量線性回歸(Linear Regression with Multiple Variables)

本文轉載自查看原文 2017-02-17 10:18 11803 算法/ python/ 機器學習/ 線性回歸

本文介紹如何使用python實現多變量線性回歸，文章參考NG的視頻和黃海廣博士的筆記

現在對房價模型增加更多的特征，例如房間數樓層等，構成一個含有多個變量的模型，模型中的特征為（ x1,x2,...,xn）

表示為：

引入 x0=1，則公式
轉化為：

1、加載訓練數據

數據格式為：

X1,X2,Y

2104,3,399900

1600,3,329900

2400,3,369000

1416,2,232000

將數據逐行讀取，用逗號切分，並放入np.array

#加載數據

#加載數據
def load_exdata(filename):
    data = []
    with open(filename, 'r') as f:
        for line in f.readlines(): 
            line = line.split(',')
            current = [int(item) for item in line]
            #5.5277,9.1302
            data.append(current)
    return data

data = load_exdata('ex1data2.txt');
data = np.array(data,np.int64)

x = data[:,(0,1)].reshape((-1,2))
y = data[:,2].reshape((-1,1))
m = y.shape[0]

# Print out some data points
print('First 10 examples from the dataset: \n')
print(' x = ',x[range(10),:],'\ny=',y[range(10),:])

First 10 examples from the dataset:

x = [[2104 3]

[1600 3]

[2400 3]

[1416 2]

[3000 4]

[1985 4]

[1534 3]

[1427 3]

[1380 3]

[1494 3]]

y= [[399900]

[329900]

[369000]

[232000]

[539900]

[299900]

[314900]

[198999]

[212000]

[242500]]

2、通過梯度下降求解theta

（1）在多維特征問題的時候，要保證特征具有相近的尺度，這將幫助梯度下降算法更快地收斂。

解決的方法是嘗試將所有特征的尺度都盡量縮放到-1 到 1 之間，最簡單的方法就是(X - mu) / sigma，其中mu是平均值， sigma 是標准差。

（2）損失函數和單變量一樣，依然計算損失平方和均值

我們的目標和單變量線性回歸問題中一樣，是要找出使得代價函數最小的一系列參數。多變量線性回歸的批量梯度下降算法為：

求導數后得到：

（3）向量化計算

向量化計算可以加快計算速度，怎么轉化為向量化計算呢？

在多變量情況下，損失函數可以寫為：

對theta求導后得到：

(1/2*m) * (X.T.dot(X.dot(theta) - y))

因此，theta迭代公式為：

theta = theta - (alpha/m) * (X.T.dot(X.dot(theta) - y))

（4）完整代碼如下：

#特征縮放
def featureNormalize(X):
    X_norm = X;
    mu = np.zeros((1,X.shape[1]))
    sigma = np.zeros((1,X.shape[1]))
    for i in range(X.shape[1]):
        mu[0,i] = np.mean(X[:,i]) # 均值
        sigma[0,i] = np.std(X[:,i])     # 標准差
#     print(mu)
#     print(sigma)
    X_norm  = (X - mu) / sigma
    return X_norm,mu,sigma

#計算損失
def computeCost(X, y, theta):
    m = y.shape[0]
#     J = (np.sum((X.dot(theta) - y)**2)) / (2*m) 
    C = X.dot(theta) - y
    J2 = (C.T.dot(C))/ (2*m)
    return J2

#梯度下降
def gradientDescent(X, y, theta, alpha, num_iters):
    m = y.shape[0]
    #print(m)
    # 存儲歷史誤差
    J_history = np.zeros((num_iters, 1))
    for iter in range(num_iters):
        # 對J求導，得到 alpha/m * (WX - Y)*x(i)， (3,m)*(m,1)  X (m,3)*(3,1) = (m,1)
        theta = theta - (alpha/m) * (X.T.dot(X.dot(theta) - y))
        J_history[iter] = computeCost(X, y, theta)
    return J_history,theta
    

iterations = 10000  #迭代次數
alpha = 0.01    #學習率
x = data[:,(0,1)].reshape((-1,2))
y = data[:,2].reshape((-1,1))
m = y.shape[0]
x,mu,sigma = featureNormalize(x)
X = np.hstack([x,np.ones((x.shape[0], 1))])
# X = X[range(2),:]
# y = y[range(2),:]

theta = np.zeros((3, 1))

j = computeCost(X,y,theta)
J_history,theta = gradientDescent(X, y, theta, alpha, iterations)


print('Theta found by gradient descent',theta)

Theta found by gradient descent [[ 109447.79646964]

[ -6578.35485416]

[ 340412.65957447]]

繪制迭代收斂圖

plt.plot(J_history)

plt.ylabel('lost');

plt.xlabel('iter count')

plt.title('convergence graph')

使用模型預測結果

def predict(data):
    testx = np.array(data)
    testx = ((testx - mu) / sigma)
    testx = np.hstack([testx,np.ones((testx.shape[0], 1))])
    price = testx.dot(theta)
    print('price is %d ' % (price))

predict([1650,3])

price is 293081

no bb,上代碼，代碼下載

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Stanford機器學習---第二講. 多變量線性回歸 Linear Regression with multiple variable 機器學習之單變量線性回歸（Linear Regression with One Variable） [原創]Python3實現多變量線性回歸模型（公式推導和源代碼） Matlab梯度下降及正規方程實現多變量的線性回歸【原】Coursera—Andrew Ng機器學習—Week 1 習題—Linear Regression with One Variable 單變量線性回歸線性回歸、梯度下降（Linear Regression、Gradient Descent）機器學習 | 算法筆記- 線性回歸（Linear Regression）機器學習：線性回歸法（Linear Regression）機器學習-----線性回歸淺談（Linear Regression）【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 2_Linear regression with one variable 單變量線性回歸