Tensorflow2(預課程)---11.1、循環神經網絡實現股票預測


Tensorflow2(預課程)---11.1、循環神經網絡實現股票預測

一、總結

一句話總結:

用了兩個SimpleRNN,后面接Dropout,最后是一個dense層輸出結果
model = tf.keras.Sequential([
    SimpleRNN(80, return_sequences=True),
    Dropout(0.2),
    SimpleRNN(100),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss='mean_squared_error')  # 損失函數用均方誤差

 

1、SimpleRNN輸入數據?

依次是數據量、循環核時間展開步數、輸出特征:x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))
# 使x_train符合RNN輸入要求:[送入樣本數, 循環核時間展開步數, 每個時間步輸入特征個數]。
# 此處整個數據集送入,送入樣本數為x_train.shape[0]即2066組數據;輸入60個開盤價,預測出第61天的開盤價,
# 循環核時間展開步數為60; 每個時間步送入的特征是某一天的開盤價,
# 只有1個數據,故每個時間步輸入特征個數為1
x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))

 

 

二、循環神經網絡實現股票預測

博客對應課程的視頻位置:

 

import numpy as np import tensorflow as tf from tensorflow.keras.layers import Dropout, Dense, SimpleRNN import matplotlib.pyplot as plt import os import pandas as pd from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error, mean_absolute_error import math 
In [2]:
maotai = pd.read_csv('./SH600519.csv') # 讀取股票文件 print(maotai) 
      Unnamed: 0        date      open     close      high       low  \
0             74  2010-04-26    88.702    87.381    89.072    87.362   
1             75  2010-04-27    87.355    84.841    87.355    84.681   
2             76  2010-04-28    84.235    84.318    85.128    83.597   
3             77  2010-04-29    84.592    85.671    86.315    84.592   
4             78  2010-04-30    83.871    82.340    83.871    81.523   
...          ...         ...       ...       ...       ...       ...   
2421        2495  2020-04-20  1221.000  1227.300  1231.500  1216.800   
2422        2496  2020-04-21  1221.020  1200.000  1223.990  1193.000   
2423        2497  2020-04-22  1206.000  1244.500  1249.500  1202.220   
2424        2498  2020-04-23  1250.000  1252.260  1265.680  1247.770   
2425        2499  2020-04-24  1248.000  1250.560  1259.890  1235.180   

         volume    code  
0     107036.13  600519  
1      58234.48  600519  
2      26287.43  600519  
3      34501.20  600519  
4      85566.70  600519  
...         ...     ...  
2421   24239.00  600519  
2422   29224.00  600519  
2423   44035.00  600519  
2424   26899.00  600519  
2425   19122.00  600519  

[2426 rows x 8 columns]
In [3]:
training_set = maotai.iloc[0:2426 - 300, 2:3].values # 前(2426-300=2126)天的開盤價作為訓練集,表格從0開始計數,2:3 是提取[2:3)列,前閉后開,故提取出C列開盤價 test_set = maotai.iloc[2426 - 300:, 2:3].values # 后300天的開盤價作為測試集 print(training_set.shape) print(test_set.shape) 
(2126, 1)
(300, 1)
In [4]:
# 歸一化
sc = MinMaxScaler(feature_range=(0, 1)) # 定義歸一化:歸一化到(0,1)之間 print(sc) 
MinMaxScaler(copy=True, feature_range=(0, 1))
In [5]:
training_set_scaled = sc.fit_transform(training_set) # 求得訓練集的最大值,最小值這些訓練集固有的屬性,並在訓練集上進行歸一化 test_set = sc.transform(test_set) # 利用訓練集的屬性對測試集進行歸一化 print(training_set_scaled[:5,]) print(test_set[:5,]) 
[[0.011711  ]
 [0.00980951]
 [0.00540518]
 [0.00590914]
 [0.00489135]]
[[0.84288404]
 [0.85345726]
 [0.84641315]
 [0.87046756]
 [0.86758781]]
In [6]:
x_train = [] y_train = [] x_test = [] y_test = [] 
In [7]:
# 測試集:csv表格中前2426-300=2126天數據
# 利用for循環,遍歷整個訓練集,提取訓練集中連續60天的開盤價作為輸入特征x_train,第61天的數據作為標簽,for循環共構建2426-300-60=2066組數據。 for i in range(60, len(training_set_scaled)): x_train.append(training_set_scaled[i - 60:i, 0]) y_train.append(training_set_scaled[i, 0]) 
In [8]:
print(x_train[:2]) print(y_train[:2]) 
[array([0.011711  , 0.00980951, 0.00540518, 0.00590914, 0.00489135,
       0.00179279, 0.00162198, 0.00450456, 0.        , 0.00540518,
       0.00863926, 0.00656697, 0.00651332, 0.00799979, 0.00783745,
       0.00360251, 0.00405424, 0.00310844, 0.00306327, 0.0080986 ,
       0.00864773, 0.00495487, 0.00675613, 0.00517932, 0.00590914,
       0.0080986 , 0.00804496, 0.00990833, 0.0135659 , 0.01280926,
       0.01387222, 0.01402609, 0.01189028, 0.00901758, 0.00576515,
       0.00654015, 0.00540518, 0.00532331, 0.00342324, 0.00378321,
       0.00226145, 0.        , 0.00166574, 0.00049549, 0.00152034,
       0.00097403, 0.00251978, 0.00207512, 0.00341194, 0.00543059,
       0.00537554, 0.00415729, 0.00515673, 0.00552094, 0.00747607,
       0.01013984, 0.01062262, 0.01270479, 0.01214014, 0.01444112]), array([0.00980951, 0.00540518, 0.00590914, 0.00489135, 0.00179279,
       0.00162198, 0.00450456, 0.        , 0.00540518, 0.00863926,
       0.00656697, 0.00651332, 0.00799979, 0.00783745, 0.00360251,
       0.00405424, 0.00310844, 0.00306327, 0.0080986 , 0.00864773,
       0.00495487, 0.00675613, 0.00517932, 0.00590914, 0.0080986 ,
       0.00804496, 0.00990833, 0.0135659 , 0.01280926, 0.01387222,
       0.01402609, 0.01189028, 0.00901758, 0.00576515, 0.00654015,
       0.00540518, 0.00532331, 0.00342324, 0.00378321, 0.00226145,
       0.        , 0.00166574, 0.00049549, 0.00152034, 0.00097403,
       0.00251978, 0.00207512, 0.00341194, 0.00543059, 0.00537554,
       0.00415729, 0.00515673, 0.00552094, 0.00747607, 0.01013984,
       0.01062262, 0.01270479, 0.01214014, 0.01444112, 0.01388634])]
[0.013886340087578372, 0.016159086609993864]
In [9]:
# 對訓練集進行打亂
np.random.seed(7) np.random.shuffle(x_train) np.random.seed(7) np.random.shuffle(y_train) tf.random.set_seed(7) 
In [10]:
# 將訓練集由list格式變為array格式
x_train, y_train = np.array(x_train), np.array(y_train) 
In [11]:
print(x_train.shape) print(y_train.shape) 
(2066, 60)
(2066,)
In [12]:
# 使x_train符合RNN輸入要求:[送入樣本數, 循環核時間展開步數, 每個時間步輸入特征個數]。
# 此處整個數據集送入,送入樣本數為x_train.shape[0]即2066組數據;輸入60個開盤價,預測出第61天的開盤價, # 循環核時間展開步數為60; 每個時間步送入的特征是某一天的開盤價, # 只有1個數據,故每個時間步輸入特征個數為1 x_train = np.reshape(x_train, (x_train.shape[0], 60, 1)) 
In [13]:
# 測試集:csv表格中后300天數據
# 利用for循環,遍歷整個測試集,提取測試集中連續60天的開盤價作為輸入特征x_train,第61天的數據作為標簽,for循環共構建300-60=240組數據。 for i in range(60, len(test_set)): x_test.append(test_set[i - 60:i, 0]) y_test.append(test_set[i, 0]) # 測試集變array並reshape為符合RNN輸入要求:[送入樣本數, 循環核時間展開步數, 每個時間步輸入特征個數] x_test, y_test = np.array(x_test), np.array(y_test) x_test = np.reshape(x_test, (x_test.shape[0], 60, 1)) 
In [14]:
print(x_train.shape) print(y_train.shape) 
(2066, 60, 1)
(2066,)
In [15]:
print(x_test.shape) print(y_test.shape) 
(240, 60, 1)
(240,)
In [16]:
model = tf.keras.Sequential([ SimpleRNN(80, return_sequences=True), Dropout(0.2), SimpleRNN(100), Dropout(0.2), Dense(1) ]) model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss='mean_squared_error') # 損失函數用均方誤差 
In [17]:
# 該應用只觀測loss數值,不觀測准確率,所以刪去metrics選項,一會在每個epoch迭代顯示時只顯示loss值

checkpoint_save_path = "./checkpoint/rnn_stock.ckpt" if os.path.exists(checkpoint_save_path + '.index'): print('-------------load the model-----------------') model.load_weights(checkpoint_save_path) cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path, save_weights_only=True, save_best_only=True, monitor='val_loss') 
-------------load the model-----------------
In [18]:
history = model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), validation_freq=1, callbacks=[cp_callback]) model.summary() 
Epoch 1/50
33/33 [==============================] - 4s 136ms/step - loss: 0.0015 - val_loss: 0.0028
Epoch 2/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 3/50
33/33 [==============================] - 4s 123ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 4/50
33/33 [==============================] - 4s 121ms/step - loss: 0.0014 - val_loss: 0.0074
Epoch 5/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0014 - val_loss: 0.0020
Epoch 6/50
33/33 [==============================] - 4s 125ms/step - loss: 0.0015 - val_loss: 0.0010
Epoch 7/50
33/33 [==============================] - 4s 122ms/step - loss: 0.0013 - val_loss: 0.0029
Epoch 8/50
33/33 [==============================] - 4s 129ms/step - loss: 0.0013 - val_loss: 0.0024
Epoch 9/50
33/33 [==============================] - 4s 126ms/step - loss: 0.0013 - val_loss: 0.0048
Epoch 10/50
33/33 [==============================] - 4s 129ms/step - loss: 0.0012 - val_loss: 0.0018
Epoch 11/50
33/33 [==============================] - 4s 125ms/step - loss: 0.0012 - val_loss: 0.0023
Epoch 12/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0012 - val_loss: 0.0065
Epoch 13/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0014 - val_loss: 0.0033
Epoch 14/50
33/33 [==============================] - 4s 118ms/step - loss: 0.0012 - val_loss: 0.0019
Epoch 15/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0011 - val_loss: 0.0012
Epoch 16/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0017
Epoch 17/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0012 - val_loss: 0.0014
Epoch 18/50
33/33 [==============================] - 4s 121ms/step - loss: 0.0011 - val_loss: 0.0010
Epoch 19/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0011 - val_loss: 0.0042
Epoch 20/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0012 - val_loss: 0.0014
Epoch 21/50
33/33 [==============================] - 4s 111ms/step - loss: 0.0012 - val_loss: 0.0017
Epoch 22/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0012 - val_loss: 0.0010
Epoch 23/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0010 - val_loss: 0.0027
Epoch 24/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0012 - val_loss: 0.0024
Epoch 25/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0011 - val_loss: 0.0011
Epoch 26/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0011 - val_loss: 0.0049
Epoch 27/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0043
Epoch 28/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0010 - val_loss: 0.0058
Epoch 29/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0029
Epoch 30/50
33/33 [==============================] - 4s 115ms/step - loss: 9.5443e-04 - val_loss: 0.0071
Epoch 31/50
33/33 [==============================] - 4s 115ms/step - loss: 8.3169e-04 - val_loss: 0.0027
Epoch 32/50
33/33 [==============================] - 4s 116ms/step - loss: 9.4572e-04 - val_loss: 0.0047
Epoch 33/50
33/33 [==============================] - 4s 115ms/step - loss: 8.9343e-04 - val_loss: 0.0090
Epoch 34/50
33/33 [==============================] - 4s 116ms/step - loss: 9.6406e-04 - val_loss: 0.0016
Epoch 35/50
33/33 [==============================] - 4s 118ms/step - loss: 0.0010 - val_loss: 0.0026
Epoch 36/50
33/33 [==============================] - 4s 117ms/step - loss: 9.9515e-04 - val_loss: 0.0064
Epoch 37/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0012 - val_loss: 0.0081
Epoch 38/50
33/33 [==============================] - 4s 115ms/step - loss: 8.3921e-04 - val_loss: 0.0020
Epoch 39/50
33/33 [==============================] - 4s 118ms/step - loss: 9.1372e-04 - val_loss: 0.0025
Epoch 40/50
33/33 [==============================] - 4s 117ms/step - loss: 8.1070e-04 - val_loss: 0.0034
Epoch 41/50
33/33 [==============================] - 4s 116ms/step - loss: 8.9496e-04 - val_loss: 0.0014
Epoch 42/50
33/33 [==============================] - 4s 115ms/step - loss: 8.7054e-04 - val_loss: 0.0038
Epoch 43/50
33/33 [==============================] - 4s 116ms/step - loss: 9.2930e-04 - val_loss: 0.0035
Epoch 44/50
33/33 [==============================] - 4s 120ms/step - loss: 9.5918e-04 - val_loss: 9.3904e-04
Epoch 45/50
33/33 [==============================] - 4s 119ms/step - loss: 9.5214e-04 - val_loss: 0.0012
Epoch 46/50
33/33 [==============================] - 4s 118ms/step - loss: 8.5229e-04 - val_loss: 0.0025
Epoch 47/50
33/33 [==============================] - 4s 118ms/step - loss: 8.4545e-04 - val_loss: 0.0055
Epoch 48/50
33/33 [==============================] - 4s 119ms/step - loss: 8.0176e-04 - val_loss: 9.1022e-04
Epoch 49/50
33/33 [==============================] - 4s 116ms/step - loss: 7.9480e-04 - val_loss: 0.0053
Epoch 50/50
33/33 [==============================] - 4s 113ms/step - loss: 7.2215e-04 - val_loss: 0.0032
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       (None, 60, 80)            6560      
_________________________________________________________________
dropout (Dropout)            (None, 60, 80)            0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 100)               18100     
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 101       
=================================================================
Total params: 24,761
Trainable params: 24,761
Non-trainable params: 0
_________________________________________________________________
In [19]:
file = open('./weights.txt', 'w') # 參數提取 for v in model.trainable_variables: file.write(str(v.name) + '\n') file.write(str(v.shape) + '\n') file.write(str(v.numpy()) + '\n') file.close() loss = history.history['loss'] val_loss = history.history['val_loss'] plt.plot(loss, label='Training Loss') plt.plot(val_loss, label='Validation Loss') plt.title('Training and Validation Loss') plt.legend() plt.show() 
In [20]:
################## predict ######################
# 測試集輸入模型進行預測 predicted_stock_price = model.predict(x_test) # 對預測數據還原---從(0,1)反歸一化到原始范圍 predicted_stock_price = sc.inverse_transform(predicted_stock_price) # 對真實數據還原---從(0,1)反歸一化到原始范圍 real_stock_price = sc.inverse_transform(test_set[60:]) # 畫出真實數據和預測數據的對比曲線 plt.plot(real_stock_price, color='red', label='MaoTai Stock Price') plt.plot(predicted_stock_price, color='blue', label='Predicted MaoTai Stock Price') plt.title('MaoTai Stock Price Prediction') plt.xlabel('Time') plt.ylabel('MaoTai Stock Price') plt.legend() plt.show() 
In [21]:
##########evaluate##############
# calculate MSE 均方誤差 ---> E[(預測值-真實值)^2] (預測值減真實值求平方后求均值) mse = mean_squared_error(predicted_stock_price, real_stock_price) # calculate RMSE 均方根誤差--->sqrt[MSE] (對均方誤差開方) rmse = math.sqrt(mean_squared_error(predicted_stock_price, real_stock_price)) # calculate MAE 平均絕對誤差----->E[|預測值-真實值|](預測值減真實值求絕對值后求均值) mae = mean_absolute_error(predicted_stock_price, real_stock_price) print('均方誤差: %.6f' % mse) print('均方根誤差: %.6f' % rmse) print('平均絕對誤差: %.6f' % mae) 
均方誤差: 1619.990084
均方根誤差: 40.249100
平均絕對誤差: 35.700861
In [ ]:
 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM