相關文章:時間序列分析之ARIMA模型預測__SAS篇
之前一直用SAS做ARIMA模型預測,今天嘗試用了一下R,發現靈活度更高,結果輸出也更直觀。現在記錄一下如何用R分析ARIMA模型。
1. 處理數據
1.1. 導入forecast包
forecast包是一個封裝的ARIMA統計軟件包,在默認情況下,R沒有預裝forecast包,因此需要先安裝該包
> install.packages("forecast')
導入依賴包zoo,再導入forecast包
> library("zoo") > library("forecast")
1.2. 導入數據
博主使用的數據是一組航空公司的銷售數據,可在此下載數據:airline.txt,共有132條數據,是以月為單位的銷售數據。
> airline <- read.table("airline.txt")
> airline
V1 V2
1 1 112
2 2 118
3 3 132
4 4 129
5 5 121
6 6 135
7 7 148
8 8 148
9 9 136
10 10 119
(........)
1.3. 將數據轉化為時間序列格式(ts)
由於將數據轉化為時間序列格式,我們並不需要時間字段,因此只取airline數據的第二列,即銷售數據,又因為該數據是以月為單位的,因此Period是12。
> airline2 <- ariline[2] > airts <- ts(airline2,start=1,frequency=12)
2. 識別模型
2.1. 查看趨勢圖
> plot.ts(airts)
由圖可見,該序列還不平穩,先做一次Log平滑,再做一次差分:
> airlog <- log(airts)
> airdiff <- diff(airlog, differences=1)
> plot.ts(airdiff)
這次看上去就比較平穩了,現在看看ACF和PACF的結果
2.2. 查看ACF和PACF
> acf(airdff, lag.max=30) > acf(airdff, lag.max=30,plot=FALSE)
Autocorrelations of series ‘airdiff’, by lag 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 1.000 0.188 -0.127 -0.154 -0.326 -0.066 0.041 -0.098 -0.343 -0.109 -0.120 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667 1.7500 0.199 0.833 0.198 -0.143 -0.110 -0.288 -0.046 0.036 -0.104 -0.313 -0.106 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000 -0.085 0.185 0.714 0.175 -0.126 -0.077 -0.214 -0.046 0.029
> pacf(airdff, lag.max=30) > pacf(airdff, lag.max=30,plot=FALSE)
Partial autocorrelations of series ‘airdiff’, by lag 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167 0.188 -0.169 -0.101 -0.317 0.018 -0.072 -0.199 -0.509 -0.171 -0.553 -0.300 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667 1.7500 1.8333 0.551 0.010 -0.200 0.164 -0.052 -0.037 -0.108 0.094 0.005 -0.095 -0.001 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000 0.057 -0.074 -0.048 0.024 0.073 0.047 0.010 0.033
從ACF和PACF可以看出來,該序列在lag=12和lag=24處有明顯的spike,說明該序列需要再做一次diff=12的差分。且PACF比ACF呈現更明顯的指數平滑的趨勢,因此先猜測ARIMA模型為:ARIMA(0,1,1)(0,1,1)[12].
2.3. 利用auto.arima
> auto.arima(airlog,trace=T) ARIMA(2,1,2)(1,1,1)[12] : -354.4719 ARIMA(0,1,0)(0,1,0)[12] : -316.8213 ARIMA(1,1,0)(1,1,0)[12] : -356.4353 ARIMA(0,1,1)(0,1,1)[12] : -359.7679 ARIMA(0,1,1)(1,1,1)[12] : -354.9069 ARIMA(0,1,1)(0,1,0)[12] : -327.5759 ARIMA(0,1,1)(0,1,2)[12] : -357.6861 ARIMA(0,1,1)(1,1,2)[12] : -363.2418 ARIMA(1,1,1)(1,1,2)[12] : -359.6535 ARIMA(0,1,0)(1,1,2)[12] : -346.1537 ARIMA(0,1,2)(1,1,2)[12] : -361.1765 ARIMA(1,1,2)(1,1,2)[12] : 1e+20 ARIMA(0,1,1)(1,1,2)[12] : -363.2418 ARIMA(0,1,1)(2,1,2)[12] : -368.8244 ARIMA(0,1,1)(2,1,1)[12] : -368.1761 ARIMA(1,1,1)(2,1,2)[12] : -367.0903 ARIMA(0,1,0)(2,1,2)[12] : -363.7024 ARIMA(0,1,2)(2,1,2)[12] : -366.6877 ARIMA(1,1,2)(2,1,2)[12] : 1e+20 ARIMA(0,1,1)(2,1,2)[12] : -368.8244 Best model: ARIMA(0,1,1)(2,1,2)[12] Series: airlog ARIMA(0,1,1)(2,1,2)[12] Coefficients: ma1 sar1 sar2 sma1 sma2 -0.2710 -0.4764 -0.1066 -0.0098 -0.1987 s.e. 0.0995 0.1432 0.1087 0.1567 0.1130 sigma^2 estimated as 0.001188: log likelihood=231.88 AIC=-369.57 AICc=-368.82 BIC=-352.9
auto.arima提供的最佳模型為ARIMA(0,1,1)(2,1,2)[12],我們可以同時測試兩個模型,看看哪個更適合。
3. 參數估計
> airarima1 <- arima(airlog,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),method="ML")
> airarima1 Series: airlog ARIMA(0,1,1)(0,1,1)[12] Coefficients: ma1 sma1 -0.3484 -0.5622 s.e. 0.0943 0.0774 sigma^2 estimated as 0.001313: log likelihood=223.63 AIC=-441.26 AICc=-441.05 BIC=-432.92
> airarima2 <- arima(airlog,order=c(0,1,1),seasonal=list(order=c(2,1,2),period=12),method="ML")
> airarima2 Series: airlog ARIMA(0,1,1)(2,1,2)[12] Coefficients: ma1 sar1 sar2 sma1 sma2 -0.3546 1.0614 -0.1211 -1.9130 0.9962 s.e. 0.0995 0.1094 0.1844 0.3887 0.3812 sigma^2 estimated as 0.0009811: log likelihood=225.56 AIC=-439.12 AICc=-438.37 BIC=-422.44
兩個ARIMA模型都采用極大似然方法估計,計算系數對應的t值:
ARIMA(0,1,1)(0,1,1)[12] :t(ma1)=-39.1791, t(sma1)=-93.8445
ARIMA(0,1,1)(2,1,2)[12] : t(ma1)=-35.8173,t(sar1)=88.68383,t(sar2)=-3.56141,t(sma1)=-12.6615,t(sma2)= 6.855526
可見兩個模型的系數都是顯著的,而ARIMA(0,1,1)(0,1,1)[12]的AIC和BIC比ARIMA(0,1,1)(2,1,2)[12]的要小,因此選擇模型ARIMA(0,1,1)(0,1,1)[12]。
4. 預測
預測五年后航空公司的銷售額:
> airforecast <- forecast.Arima(airarima1,h=5,level=c(99.5)) > airforecast Point Forecast Lo 99.5 Hi 99.5 Jan 12 6.038649 5.936951 6.140348 Feb 12 5.988762 5.867380 6.110143 Mar 12 6.145428 6.007137 6.283719 Apr 12 6.118993 5.965646 6.272340 May 12 6.159657 5.992605 6.326709
> plot.forecast(airforecast)