時間序列分析之ARIMA模型預測__R篇


相關文章:時間序列分析之ARIMA模型預測__SAS篇

 

之前一直用SAS做ARIMA模型預測,今天嘗試用了一下R,發現靈活度更高,結果輸出也更直觀。現在記錄一下如何用R分析ARIMA模型。

 

1. 處理數據

1.1. 導入forecast包

forecast包是一個封裝的ARIMA統計軟件包,在默認情況下,R沒有預裝forecast包,因此需要先安裝該包

> install.packages("forecast')

導入依賴包zoo,再導入forecast包

> library("zoo")
> library("forecast")

1.2. 導入數據

博主使用的數據是一組航空公司的銷售數據,可在此下載數據:airline.txt,共有132條數據,是以月為單位的銷售數據。

> airline <- read.table("airline.txt")
> airline

    V1 V2
  1 1 112
  2 2 118
  3 3 132
  4 4 129
  5 5 121
  6 6 135
  7 7 148
  8 8 148
  9 9 136
  10 10 119

(........)

1.3. 將數據轉化為時間序列格式(ts)

 由於將數據轉化為時間序列格式,我們並不需要時間字段,因此只取airline數據的第二列,即銷售數據,又因為該數據是以月為單位的,因此Period是12。

> airline2 <- ariline[2]
> airts <- ts(airline2,start=1,frequency=12)

 

2. 識別模型

2.1. 查看趨勢圖

> plot.ts(airts)

 

由圖可見,該序列還不平穩,先做一次Log平滑,再做一次差分:

> airlog <- log(airts)
> airdiff <- diff(airlog, differences=1)
> plot.ts(airdiff)

 

這次看上去就比較平穩了,現在看看ACF和PACF的結果

2.2. 查看ACF和PACF

 

> acf(airdff, lag.max=30)
> acf(airdff, lag.max=30,plot=FALSE)
Autocorrelations of series ‘airdiff’, by lag
0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 1.000 0.188 -0.127 -0.154 -0.326 -0.066 0.041 -0.098 -0.343 -0.109 -0.120 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667 1.7500 0.199 0.833 0.198 -0.143 -0.110 -0.288 -0.046 0.036 -0.104 -0.313 -0.106 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000 -0.085 0.185 0.714 0.175 -0.126 -0.077 -0.214 -0.046 0.029

 

 

> pacf(airdff, lag.max=30)
> pacf(airdff, lag.max=30,plot=FALSE)
Partial autocorrelations of series ‘airdiff’, by lag
0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167 0.188 -0.169 -0.101 -0.317 0.018 -0.072 -0.199 -0.509 -0.171 -0.553 -0.300 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667 1.7500 1.8333 0.551 0.010 -0.200 0.164 -0.052 -0.037 -0.108 0.094 0.005 -0.095 -0.001 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000 0.057 -0.074 -0.048 0.024 0.073 0.047 0.010 0.033

從ACF和PACF可以看出來,該序列在lag=12和lag=24處有明顯的spike,說明該序列需要再做一次diff=12的差分。且PACF比ACF呈現更明顯的指數平滑的趨勢,因此先猜測ARIMA模型為:ARIMA(0,1,1)(0,1,1)[12].

2.3. 利用auto.arima

> auto.arima(airlog,trace=T)

 ARIMA(2,1,2)(1,1,1)[12]                    : -354.4719
 ARIMA(0,1,0)(0,1,0)[12]                    : -316.8213
 ARIMA(1,1,0)(1,1,0)[12]                    : -356.4353
 ARIMA(0,1,1)(0,1,1)[12]                    : -359.7679
 ARIMA(0,1,1)(1,1,1)[12]                    : -354.9069
 ARIMA(0,1,1)(0,1,0)[12]                    : -327.5759
 ARIMA(0,1,1)(0,1,2)[12]                    : -357.6861
 ARIMA(0,1,1)(1,1,2)[12]                    : -363.2418
 ARIMA(1,1,1)(1,1,2)[12]                    : -359.6535
 ARIMA(0,1,0)(1,1,2)[12]                    : -346.1537
 ARIMA(0,1,2)(1,1,2)[12]                    : -361.1765
 ARIMA(1,1,2)(1,1,2)[12]                    : 1e+20
 ARIMA(0,1,1)(1,1,2)[12]                    : -363.2418
 ARIMA(0,1,1)(2,1,2)[12]                    : -368.8244
 ARIMA(0,1,1)(2,1,1)[12]                    : -368.1761
 ARIMA(1,1,1)(2,1,2)[12]                    : -367.0903
 ARIMA(0,1,0)(2,1,2)[12]                    : -363.7024
 ARIMA(0,1,2)(2,1,2)[12]                    : -366.6877
 ARIMA(1,1,2)(2,1,2)[12]                    : 1e+20
 ARIMA(0,1,1)(2,1,2)[12]                    : -368.8244

 Best model: ARIMA(0,1,1)(2,1,2)[12]                    

Series: airlog 
ARIMA(0,1,1)(2,1,2)[12]                    

Coefficients:
          ma1     sar1     sar2     sma1     sma2
      -0.2710  -0.4764  -0.1066  -0.0098  -0.1987
s.e.   0.0995   0.1432   0.1087   0.1567   0.1130

sigma^2 estimated as 0.001188:  log likelihood=231.88
AIC=-369.57   AICc=-368.82   BIC=-352.9

auto.arima提供的最佳模型為ARIMA(0,1,1)(2,1,2)[12],我們可以同時測試兩個模型,看看哪個更適合。

 

3. 參數估計

> airarima1 <- arima(airlog,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),method="ML")
> airarima1 Series: airlog ARIMA(
0,1,1)(0,1,1)[12] Coefficients: ma1 sma1 -0.3484 -0.5622 s.e. 0.0943 0.0774 sigma^2 estimated as 0.001313: log likelihood=223.63 AIC=-441.26 AICc=-441.05 BIC=-432.92
> airarima2 <- arima(airlog,order=c(0,1,1),seasonal=list(order=c(2,1,2),period=12),method="ML")
> airarima2 Series: airlog ARIMA(
0,1,1)(2,1,2)[12] Coefficients: ma1 sar1 sar2 sma1 sma2 -0.3546 1.0614 -0.1211 -1.9130 0.9962 s.e. 0.0995 0.1094 0.1844 0.3887 0.3812 sigma^2 estimated as 0.0009811: log likelihood=225.56 AIC=-439.12 AICc=-438.37 BIC=-422.44

兩個ARIMA模型都采用極大似然方法估計,計算系數對應的t值:

ARIMA(0,1,1)(0,1,1)[12] :t(ma1)=-39.1791, t(sma1)=-93.8445

ARIMA(0,1,1)(2,1,2)[12] : t(ma1)=-35.8173,t(sar1)=88.68383,t(sar2)=-3.56141,t(sma1)=-12.6615,t(sma2)= 6.855526

可見兩個模型的系數都是顯著的,而ARIMA(0,1,1)(0,1,1)[12]的AIC和BIC比ARIMA(0,1,1)(2,1,2)[12]的要小,因此選擇模型ARIMA(0,1,1)(0,1,1)[12]。

 

4. 預測

 預測五年后航空公司的銷售額:

> airforecast <- forecast.Arima(airarima1,h=5,level=c(99.5))
> airforecast
Point Forecast Lo 99.5 Hi 99.5
Jan 12 6.038649 5.936951 6.140348
Feb 12 5.988762 5.867380 6.110143
Mar 12 6.145428 6.007137 6.283719
Apr 12 6.118993 5.965646 6.272340
May 12 6.159657 5.992605 6.326709

> plot.forecast(airforecast)

 

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM