原创文章,转载或者引用本文内容请注明来源及原作者!
数据集 $D=\{(x_1,y_1),(x_2,y_2)\cdots(x_m,y_m)\}$ ,其中 $x_i=\left[x_i^{(1)},x_i^{(2)}\cdots x_i^{(d)}\right]^T$ 表示一条样本数据有 $d$ 个属性,我们的目标是寻找 $d$ 维列向量 $w$ 和常数 $b$,使得模型 $f(x_i)=w^Tx_i+b$ 所得的预测值与真实值 $y_i$ 尽可能接近。
令 $y={(y_1,y_2\cdots y_i)}^T$ ,我们要最小化 $\left\|y-X\widehat w\right\|^2$
此处将最小化的目标函数视为 $\widehat w$ 的“单变量”函数。令 $h(\widehat w)={(y-X\widehat w)}^T(y-X\widehat w)$,求它的最小值只需对 $\widehat w$ 求导,导数为0时 $\widehat w$ 的取值即为所求。求导过程如下:
\begin{align*}
\frac{\partial h(\boldsymbol{\hat w})}{\partial \boldsymbol{\hat w}}
&= \frac{\partial [(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})^T(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})]}{\partial \boldsymbol{\hat w}}\\
&= 2\frac{\partial (\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})^T}{\partial \boldsymbol{\hat w}}(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})\\
&= 2\frac{\partial \boldsymbol y^T}{\partial \boldsymbol{\hat w}}(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})-2\frac{\partial (\boldsymbol X\boldsymbol{\hat {w}})^T}{\partial \boldsymbol{\hat w}}(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}}) \\
&= 0-2\boldsymbol X^T(\boldsymbol y-\boldsymbol X\boldsymbol{\hat {w}})\\
&= 2\boldsymbol X^T(\boldsymbol X\boldsymbol{\hat {w}}-\boldsymbol y)\tag 1\\
\end{align*}
令上式等于0,则有:
\begin{eqnarray*}
&&\because 2\boldsymbol X^T(\boldsymbol X\boldsymbol{\hat {w}}-\boldsymbol y)=2\boldsymbol X^T\boldsymbol X\boldsymbol{\hat {w}}-2\boldsymbol X^T\boldsymbol y=0\\
&&\therefore \boldsymbol X^T\boldsymbol X\boldsymbol{\hat {w}}=\boldsymbol X^T\boldsymbol y\\
&&\therefore \boldsymbol{\hat {w}}=(\boldsymbol X^T\boldsymbol X)^{-1}\boldsymbol X^T\boldsymbol y\\
\end{eqnarray*}
如果直接按上述公式求解,则是最小二乘法。因为 $\boldsymbol X^T\boldsymbol X$ 不一定可逆,所以求解一般用梯度下降法,梯度下降法的公式如下:
\begin{eqnarray*}
\widehat{\boldsymbol w^k}=\widehat{\boldsymbol w^{k-1}}-\gamma\frac{\partial h(\widehat{\boldsymbol w})}{\partial\widehat{\boldsymbol w}}
\end{eqnarray*}
上式中 $\gamma$ 为学习率, $\frac{\partial h(\widehat{\boldsymbol w})}{\partial\widehat{\boldsymbol w}}$ 由公式(1) 可知。
具体的python3代码如下:
#-*- coding: gbk -*- # @Author : zhangchao # @Data : 2020-06-26 # @Cnblogs :https://www.cnblogs.com/zhangchaolts import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_regression from sklearn.linear_model import LinearRegression # 线性(最大最小)归一化 def min_max_normalization(X): Xmin = np.zeros((1, X.shape[2])) Xmax = np.zeros((1, X.shape[2])) for i in range(X.shape[2]): Xmin[0,i] = np.min(X[:,0,i]) Xmax[0,i] = np.max(X[:,0,i]) return (X - Xmin) / (Xmax - Xmin) # 0均值标准化 def zero_mean_normalization(X): mu = np.zeros((1, X.shape[2])) sigma = np.zeros((1, X.shape[2])) for i in range(X.shape[2]): mu[0,i] = np.mean(X[:,0,i]) sigma[0,i] = np.std(X[:,0,i]) return (X - mu) / sigma # 随机生成50个样本,每个样本2个特征,返回值中coef为回归系数。X.shape:[50,2], Y.shape:[50], coef.shape:[2] X, Y, coef = make_regression(n_samples=50, n_features=2, noise=10, bias=10., coef=True) print('coef:', coef) # 采用sklearn中LinearRegression求出回归系数和截距 model = LinearRegression() model.fit(X, Y) print('model.coef_', model.coef_) # 回归系数 print('model.intercept_', model.intercept_) # 截距 # 以下是自己实现的变量线性回归模型 # 对X和Y增加维度,方便矩阵计算 X = np.expand_dims(X, 1) # [50, 1, 2] Y = np.expand_dims(Y, 1) # [50, 1] Y = np.expand_dims(Y, 2) # [50, 1, 1] #X = min_max_normalization(X) #X = zero_mean_normalization(X) # 损失函数为均方误差MSE(mean squared error) loss_function = lambda w, intercept: np.mean([(y[0][0] - (x.dot(w)[0][0] + intercept)) ** 2 for x, y in zip(X, Y)]) # 梯度下降法求解回归系数w和截距intercept def gradient_descent(w, intercept, learning_rate): for x, y in zip(X, Y): #y_predict = w[0][0] * x[0][0] + w[1][0] * x[0][1] + intercept y_predict = x.dot(w)[0][0] + intercept diff = y_predict - y #w[0][0] -= learning_rate * diff * x[0][0] #w[1][0] -= learning_rate * diff * x[0][1] w -= learning_rate * 2 * x.transpose() * diff intercept -= learning_rate * diff return w, intercept # 初始化回归系数w和截距intercept w = np.ones((X.shape[-1], 1)) # [2, 1] intercept = 0 # 记录loss变化 loss_history = [] for i in range(50): loss = loss_function(w, intercept) loss_history.append(loss) print('step%d loss:' % (i+1), loss) w, intercept = gradient_descent(w, intercept, 1.0/(100+i)) # 动态learning_rate,由大到小 print('w:', w) print('intercept:', intercept) # 绘图 plt.plot(loss_history) plt.ylabel('lost'); plt.xlabel('iter count') plt.title('convergence graph') plt.savefig('linear_regression.png')
运行结果如下:
coef: [84.54421311 5.2545353 ] model.coef_ [84.9310078 5.48811974] model.intercept_ 9.434942960315071 step1 loss: 6040.4032177086465 step2 loss: 1102.8264849114787 step3 loss: 283.25257014464876 step4 loss: 123.54863036366649 step5 loss: 85.00921056460024 step6 loss: 73.48634556845407 step7 loss: 69.44523759664635 step8 loss: 67.88178378711535 step9 loss: 67.23948955078853 step10 loss: 66.96354271958877 step11 loss: 66.83946411821033 step12 loss: 66.78031963233637 step13 loss: 66.74975970519239 step14 loss: 66.73219950499299 step15 loss: 66.72078988721178 step16 loss: 66.71243648657185 step17 loss: 66.70569500741604 step18 loss: 66.6998648623189 step19 loss: 66.69459085397719 step20 loss: 66.68968396030029 step21 loss: 66.68503840714563 step22 loss: 66.6805919983039 step23 loss: 66.676306398435 step24 loss: 66.67215691343604 step25 loss: 66.66812695912645 step26 loss: 66.66420493944473 step27 loss: 66.66038241625293 step28 loss: 66.65665300068504 step29 loss: 66.65301166318754 step30 loss: 66.64945429467006 step31 loss: 66.64597742244224 step32 loss: 66.64257802368101 step33 loss: 66.63925340142816 step34 loss: 66.63600110122701 step35 loss: 66.63281885446035 step36 loss: 66.62970453938905 step37 loss: 66.62665615401313 step38 loss: 66.62367179688592 step39 loss: 66.62074965331212 step40 loss: 66.61788798521602 step41 loss: 66.6150851235298 step42 loss: 66.61233946232669 step43 loss: 66.60964945417334 step44 loss: 66.60701360634461 step45 loss: 66.60443047765548 step46 loss: 66.60189867574294 step47 loss: 66.59941685468131 step48 loss: 66.59698371285032 step49 loss: 66.59459799099857 step50 loss: 66.59225847046397 w: [[85.29343409] [ 5.25287288]] intercept: [[9.31683096]]
loss值变化如图所示:
【参考文献】
- 多元线性回归中的公式推导 : https://blog.csdn.net/wx_blue_pig/article/details/79791906
- Python【线性回归】 : https://blog.csdn.net/yellow_python/article/details/81224614
- https://github.com/huping404/linear-regression/blob/master/stochastic.py
- python实现多变量线性回归(Linear Regression with Multiple Variables) : https://www.cnblogs.com/xiaoqi/p/6408614.html
- https://github.com/darlinglele/regression/blob/master/linear_regression.py