線性回歸
數學中的回歸是指,現實中的變量之間存在一種函數關系,通過一批樣本數據找出這個函數關系,即通過樣本數據回歸到真實的函數關系。
線性回歸/Linear Regression是指,一些變量之間存在線性關系,通過一批樣本數據找出這個關系,線性關系函數的圖形是一條直線。
線性函數的方程如下:
y = wx + by=wx+b
線性回歸就是根據一批樣本數據,確定這個方程,即確定權重ww和偏置bb。
因此,要創建線性模型,需要:
- 應變量(y)
- 斜率或權重變量(w)
- 截距或偏置(b)
- 自變量(x)
讓我們開始使用TensorFlow建立線性模型:
import tensorflow.compat.v1 as tf import numpy as np tf.compat.v1.disable_eager_execution() # 為參數斜率(W)創建變量,初始值為0.4 W = tf.Variable([.4], tf.float32) # 為參數截距(b)創建變量,初始值為-0.4 b = tf.Variable([-0.4], tf.float32) # 為自變量(用x表示)創建占位符 x = tf.placeholder(tf.float32) # 線性回歸方程 linear_model = W * x + b # 初始化所有變量 sess = tf.compat.v1.Session() init = tf.compat.v1.global_variables_initializer() sess.run(init) # 運行回歸模型,輸出y值 print(sess.run(linear_model, feed_dict={x: [1, 2, 3, 4]}))
輸出
C:\Anaconda3\python.exe "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\pydevconsole.py" --mode=client --port=60639 import sys; print('Python %s on %s' % (sys.version, sys.platform)) sys.path.extend(['C:\\app\\PycharmProjects', 'C:/app/PycharmProjects']) Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help. PyDev console: using IPython 7.12.0 Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] on win32 runfile('C:/app/PycharmProjects/ArtificialIntelligence/test.py', wdir='C:/app/PycharmProjects/ArtificialIntelligence') WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. 2020-06-19 18:08:36.592548: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2020-06-19 18:08:36.612575: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a941da5370 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-06-19 18:08:36.614292: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version [0. 0.4 0.8000001 1.2 ]
上面的代碼只是根據線性方程,輸入x值,輸出y值。
我們需要使用樣本數據訓練權重w
和偏置b
,根據輸出的y值,計算誤差(預測結果和已知結果之間的差異),得到代價函數,利用梯度下降法求取代價函數的最小值,得到最終的權重w
和偏置b
。
代價函數
代價函數用於度量模型的實際輸出和期望輸出之間的差距。我們將使用常用的均方差作為代價函數:
E = \frac{1}{2}(t – y)^2E=21(t–y)2
- t – 目標輸出
- y – 實際輸出
- E – 均方差
# y占位符,接受樣本中的y值 y = tf.placeholder(tf.float32) # 計算均方差 error = linear_model - y squared_errors = tf.square(error) loss = tf.reduce_sum(squared_errors) # 打印誤差 print(sess.run(loss, feed_dict = {x:[1, 2, 3, 4], y:[2, 4, 6, 8]}))
完整代碼
import tensorflow.compat.v1 as tf import numpy as np tf.compat.v1.disable_eager_execution() # 為參數斜率(W)創建變量,初始值為0.4 W = tf.Variable([.4], tf.float32) # 為參數截距(b)創建變量,初始值為-0.4 b = tf.Variable([-0.4], tf.float32) # 為自變量(用x表示)創建占位符 x = tf.placeholder(tf.float32) # 線性回歸方程 linear_model = W * x + b # 初始化所有變量 sess = tf.compat.v1.Session() init = tf.compat.v1.global_variables_initializer() sess.run(init) # 運行回歸模型,輸出y值 print(sess.run(linear_model, feed_dict={x: [1, 2, 3, 4]})) # y占位符,接受樣本中的y值 y = tf.placeholder(tf.float32) # 計算均方差 error = linear_model - y squared_errors = tf.square(error) loss = tf.reduce_sum(squared_errors) # 打印誤差 print(sess.run(loss, feed_dict = {x:[1, 2, 3, 4], y:[2, 4, 6, 8]}))
輸出
C:\Anaconda3\python.exe "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\pydevconsole.py" --mode=client --port=64343 import sys; print('Python %s on %s' % (sys.version, sys.platform)) sys.path.extend(['C:\\app\\PycharmProjects', 'C:/app/PycharmProjects']) Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help. PyDev console: using IPython 7.12.0 Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] on win32 runfile('C:/app/PycharmProjects/ArtificialIntelligence/test.py', wdir='C:/app/PycharmProjects/ArtificialIntelligence') WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. 2020-06-19 18:15:29.396415: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2020-06-19 18:15:29.415583: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x17166c35f50 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-06-19 18:15:29.417842: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version [0. 0.4 0.8000001 1.2 ] 90.24
可以看到輸出的誤差值很大。因此,我們需要調整權重(W)和偏差(b),以減少誤差。
模型訓練
TensorFlow提供了優化器,可以緩慢地更改每個變量(權重w,偏置b),最小化代價函數。
最簡單的優化器是梯度下降優化器,它根據代價函數對變量的變化率(導數)來修改對應變量,進行迭代得到代價函數的最小值。
# 創建梯度下降優化器實例,學習率為0.01 optimizer = tf.train.GradientDescentOptimizer(0.01) # 使用優化器最小化代價函數 train = optimizer.minimize(loss) # 在1000次迭代中最小化誤差,這樣在迭代時,將使用優化器根據誤差修改模型參數w & b以最小化誤差 for i in range(1000): sess.run(train, {x:[1, 2, 3, 4], y:[2, 4, 6, 8]}) # 打印權重和偏差 print(sess.run([W, b]))
完整代碼:
import tensorflow.compat.v1 as tf import numpy as np tf.compat.v1.disable_eager_execution() # 為參數斜率(W)創建變量,初始值為0.4 W = tf.Variable([.4], tf.float32) # 為參數截距(b)創建變量,初始值為-0.4 b = tf.Variable([-0.4], tf.float32) # 為自變量(用x表示)創建占位符 x = tf.placeholder(tf.float32) # 線性回歸方程 linear_model = W * x + b # 初始化所有變量 sess = tf.compat.v1.Session() init = tf.compat.v1.global_variables_initializer() sess.run(init) # 運行回歸模型,輸出y值 print(sess.run(linear_model, feed_dict={x: [1, 2, 3, 4]})) # y占位符,接受樣本中的y值 y = tf.placeholder(tf.float32) # 計算均方差 error = linear_model - y squared_errors = tf.square(error) loss = tf.reduce_sum(squared_errors) # 打印誤差 print(sess.run(loss, feed_dict = {x:[1, 2, 3, 4], y:[2, 4, 6, 8]})) # 創建梯度下降優化器實例,學習率為0.01 optimizer = tf.train.GradientDescentOptimizer(0.01) # 使用優化器最小化代價函數 train = optimizer.minimize(loss) # 在1000次迭代中最小化誤差,這樣在迭代時,將使用優化器根據誤差修改模型參數w & b以最小化誤差 for i in range(1000): sess.run(train, {x:[1, 2, 3, 4], y:[2, 4, 6, 8]}) # 打印權重和偏差 print(sess.run([W, b]))
輸出
C:\Anaconda3\python.exe "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\pydevconsole.py" --mode=client --port=64387 import sys; print('Python %s on %s' % (sys.version, sys.platform)) sys.path.extend(['C:\\app\\PycharmProjects', 'C:/app/PycharmProjects']) Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help. PyDev console: using IPython 7.12.0 Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] on win32 runfile('C:/app/PycharmProjects/ArtificialIntelligence/test.py', wdir='C:/app/PycharmProjects/ArtificialIntelligence') WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. 2020-06-19 18:16:36.829150: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2020-06-19 18:16:36.848335: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x193806eb120 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-06-19 18:16:36.850094: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version [0. 0.4 0.8000001 1.2 ] 90.24 [array([1.9999996], dtype=float32), array([9.863052e-07], dtype=float32)]