梯度上升法每次講當前參數向每個特征的梯度移動一小部分,經過多次迭代得到最后的解,在梯度上升的時候可以采用隨機取樣,雖然效果差不多,但是可以占用更少的計算資源,同時隨機梯度上升法是一個在線算法,他可以在新數據到來時就可以完成參數更新,而不需要重新讀取整個數據集來進行批處理計算。
# -*- coding:UTF-8 -*- import matplotlib.pyplot as plt import numpy as np def LoadDataSet(): data_mat, label_mat = [], [] with open("testSet.txt") as f: for row in f.readlines(): row_tmp = row.strip().split() data_mat.append([1.0, float(row_tmp[0]), float(row_tmp[1])]) label_mat.append(int(row_tmp[2])) return data_mat, label_mat def Sigmoid(inX): return 1.0 / (1 + np.exp(-inX)) def GradAscent(DataMatInput, ClassLabels): data_mat = np.mat(DataMatInput) label_mat = np.mat(ClassLabels).transpose() #轉置 m, n = np.shape(data_mat) #返回矩陣大小,m為行,n為列數(也是特征數) alpha = 0.001 cycles = 500 weight = np.ones((n,1)) for i in range(cycles): tmp_mat = Sigmoid(data_mat * weight) error = label_mat - tmp_mat # [m * 1]的向量 weight = weight + alpha * data_mat.transpose() * error #將樣本數據轉置之后才可以做矩陣運算 return weight.getA() def run(): data_mat, label_mat = LoadDataSet() weight = GradAscent(data_mat, label_mat) data_arr = np.array(data_mat) n = np.shape(data_mat)[0] xcord1, ycord1, xcord2, ycord2 = [], [], [], [] for i in range(n): if int(label_mat[i]) == 1: xcord1.append(data_arr[i, 1]) ycord1.append(data_arr[i, 2]) else: xcord2.append(data_arr[i, 1]) ycord2.append(data_arr[i, 2]) fig = plt.figure() ax = fig.add_subplot(111) # 添加subplot 1行1列第一塊畫布 ax.scatter(xcord1, ycord1, s=20, c='red', marker='s', alpha=.5) ax.scatter(xcord2, ycord2, s=20, c='green', alpha=.5) x = np.arange(-3.0, 3.0, 0.1) y = (-weight[0] - weight[1] * x) / weight[2] #反解y ax.plot(x, y) plt.title('logistic') plt.xlabel('X1') plt.ylabel('X2') plt.show() run()
參考鏈接:
https://blog.csdn.net/c406495762/article/details/77723333#1__238(解決了為什么梯度上升只需要用error * 數據就可以完成向梯度移動的問題)