LSTM時間序列預測及網絡層搭建


一、LSTM預測未來一年某航空公司的客運流量

 給你一個數據集,只有一列數據,這是一個關於時間序列的數據,從這個時間序列中預測未來一年某航空公司的客運流量。數據形式:

 

二、實戰

 1)數據下載

  你可以google passenger.csv文件,即可找到對應的項目數據,如果沒有找到,這里提供數據的下載鏈接:https://pan.baidu.com/s/1a7h5ZknDyT0azW9mv5st7w 提取碼:u5h3

 2)jupyter notebook

  桌面新建airline文件夾,passenger.csv移動進去,按住shift+右鍵選擇在此處新建命令窗口,輸入jupyter notebook,新建名為airline_predict的腳本

 3)查看數據:

import pandas as pd
df = pd.read_csv('passenger.csv', header=None)
df.columns = ['time', 'passengers']
df.head(12)

  結果如下:我們發現數據以年為單位,記錄了每一年中每一月份的乘客量

  我們來做出趨勢圖,看看客運量是如何變化的:

df = df.set_index('time')#將第一列設置為行索引
df.head(12)

import matplotlib.pyplot as plt
df['passengers'].plot()
plt.show()

  結果如下:從圖上看出,客運量還是逐年增加的

 

 4)處理數據,划分訓練集和測試集

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

class Airline_Predict:
    def __init__(self, filename, sequence_length=10, split=0.8):
        self.filename = filename
        self.sequence_length = sequence_length
        self.split = split
        
    def load_data(self):
        df = pd.read_csv(self.filename, sep=',', usecols=[1], header=None)
        
        data_all = np.array(df).astype('float')
        print(data_all.shape)#(144, 1)
        
        #數據歸一化
        MMS = MinMaxScaler()
        data_all = MMS.fit_transform(data_all)
        print(data_all.shape)
        
        #構造輸入lstm的3D數據:(133, 11, 1)
        #其中特征是10個,第11個是數值標簽
        
        data = []
        for i in range( len(data_all) - self.sequence_length - 1 ):
            data.append( data_all[ i: i+self.sequence_length+1 ] )
            
        #global reshaped_data
        reshaped_data = np.array(data).astype('float64')
        print(reshaped_data.shape)#(133, 11, 1)
        
        #打亂第一維數據
        np.random.shuffle(reshaped_data)
        
        #對133組數據處理,每組11個數據,前10個作為特征,第11個是數值標簽:(133, 10,1)
        x = reshaped_data[:, :-1]
        print('samples shape:', x.shape, '\n')#(133, 10, 1)
        y = reshaped_data[:, -1]
        print('labels shape:', y.shape, '\n')#(133, 1)
        
        #構建訓練集
        split_boundary = int(reshaped_data.shape[0] * self.split)
        train_x = x[:split_boundary]
        print('train_x shape:', train_x.shape)
        
        #構建測試集
        test_x = x[ split_boundary: ]
        print('test_x shape:', test_x.shape)
        
        #訓練集標簽
        train_y = y[ : split_boundary ]
        print('train_y shape', train_y.shape)
        
        #測試集標簽
        test_y = y[ split_boundary: ]
        print('test_y shape', test_y.shape)
        
        return train_x, train_y, test_x, test_y, MMS
        
filename = 'passenger.csv'
AirLine = Airline_Predict(filename)
train_x, train_y, test_x, test_y, MMS = AirLine.load_data()

 

 5)訓練模型

#coding=gbk

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

import warnings
warnings.filterwarnings('ignore')

from keras.models import Sequential
from keras.layers import LSTM, Dense, Activation

import matplotlib.pyplot as plt

class Airline_Predict:
    def __init__(self, filename, sequence_length=10, split=0.8):
        self.filename = filename
        self.sequence_length = sequence_length
        self.split = split
        
    def load_data(self):
        df = pd.read_csv(self.filename, sep=',', usecols=[1], header=None)
        
        data_all = np.array(df).astype('float')
        print(data_all.shape)#(144, 1)
        
        #數據歸一化
        MMS = MinMaxScaler()
        data_all = MMS.fit_transform(data_all)
        print(data_all.shape)
        
        #構造輸入lstm的3D數據:(133, 11, 1)
        #其中特征是10個,第11個是數值標簽
        
        data = []
        for i in range( len(data_all) - self.sequence_length - 1 ):
            data.append( data_all[ i: i+self.sequence_length+1 ] )
            
        #global reshaped_data
        reshaped_data = np.array(data).astype('float64')
        print(reshaped_data.shape)#(133, 11, 1)
        
        #打亂第一維數據
        np.random.shuffle(reshaped_data)
        
        #對133組數據處理,每組11個數據,前10個作為特征,第11個是數值標簽:(133, 10,1)
        x = reshaped_data[:, :-1]
        print('samples shape:', x.shape, '\n')#(133, 10, 1)
        y = reshaped_data[:, -1]
        print('labels shape:', y.shape, '\n')#(133, 1)
        
        #構建訓練集
        split_boundary = int(reshaped_data.shape[0] * self.split)
        train_x = x[:split_boundary]
        print('train_x shape:', train_x.shape)
        
        #構建測試集
        test_x = x[ split_boundary: ]
        print('test_x shape:', test_x.shape)
        
        #訓練集標簽
        train_y = y[ : split_boundary ]
        print('train_y shape', train_y.shape)
        
        #測試集標簽
        test_y = y[ split_boundary: ]
        print('test_y shape', test_y.shape)
        
        return train_x, train_y, test_x, test_y, MMS
    
    def build_model(self):
        #LSTM函數的input_dim參數是輸入的train_x的最后一個維度
        #train_x的維度為(n_samples, time_sequence_steps, input_dim)
        #在keras 的官方文檔中,說了LSTM是整個Recurrent層實現的一個具體類,它需要的輸入數據維度是:
        #形如(samples,timesteps,input_dim)的3D張量
        #而這個time_sequence_steps就是我們采用的時間窗口,即把一個時間序列當成一條長鏈,我們固定一個一定長度的窗口對這個長鏈進行采用
        #這里使用了兩個LSTM進行疊加,第二個LSTM的第一個參數指的是輸入的維度,這和第一個LSTM的輸出維度並不一樣,這也是LSTM比較隨意的地方
        #最后一層采用了線性層
        
        model = Sequential()
        model.add( LSTM( input_dim=1, output_dim=50, return_sequences=True ) )
        print( "model layers:",model.layers )
        
        model.add( LSTM(100, return_sequences=False) )
        model.add( Dense( output_dim=1 ) )
        model.add( Activation('linear') )
        
        model.compile( loss='mse', optimizer='rmsprop' )
        return model
        
    
    def train_model(self, train_x, train_y, test_x, test_y):
        model = self.build_model()
        
        try:
            model.fit( train_x, train_y, batch_size=512, nb_epoch=100, validation_split=0.1 )
            predict = model.predict(test_x)
            #print(predict.size)
            predict = np.reshape( predict, (predict.size, ) )#變成向量
            test_y = np.reshape( test_y, (test_y.size, ) )
        except KeyboardInterrupt:
            print('predict:',predict)
            print('test_y',test_y)
        
        
        print('After predict:\n',predict)
        print('The right test_y:\n',test_y)
        
        try:
            fig1 = plt.figure(1)
            plt.plot(predict, 'r')
            plt.plot(test_y, 'g-')
            plt.title('This pic is drawed using Standard Data')
            plt.legend(['predict', 'true'])
            
        except Exception as e:
            print(e)
        
        return predict, test_y
        
        
filename = 'passenger.csv'
AirLine = Airline_Predict(filename)
train_x, train_y, test_x, test_y, MMS = AirLine.load_data()

predict_y, test_y = AirLine.train_model(train_x, train_y, test_x, test_y)

#對標注化后的數據還原
predict_y = MMS.inverse_transform( [ [i] for i in predict_y ] )
test_y = MMS.inverse_transform( [ [i] for i in test_y ] )

fig2 = plt.figure(2)
plt.plot(predict_y, 'g:', label='prediction')
plt.plot(test_y, 'r-', label='True')
plt.title('This pic is drawed using Standard_Inversed Data')
plt.legend(['predict', 'true'])
plt.show()

print('predict:',np.reshape(predict_y, (predict_y.size,)) )
print('True:',np.reshape(test_y, (test_y.size,)))

 

 

 

三、代碼結構


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM