摘要: 僅用於記錄R語言學習過程:
內容提要:
時間與日期數據的處理;
lubridate包;
時間序列介紹及舉例
正文:
時間與日期數據的處理
n 導讀:
u 時間生成函數:as.Date()
> as.Date('2017-02-16')
[1] "2017-02-16"
> class(as.Date('2017-02-16'))
[1] "Date" #由字符串變成日期
> as.Date('20170301', format = '%Y%m%d') #格式要一一對應
[1] "2017-03-01" 格式符號
> as.Date(100,origin = '2018-08-07')
[1] "2018-11-15" #從2018-08-07開始100天
格式符號:
%m :月份
%b :對應月份的簡稱,如Jan
%B:對應月份的全稱,如July
%Y:四位數年份
%y:二位數年份
%d: 對應天
u 時間生成函數:ISOdate()
> ISOdate(2018,4,21,15,21,52)
[1] "2018-04-21 15:21:52 GMT"
u as.POSIXlt()函數:
> dts <- c('2005-10-21 18:24:24','2017-02-16 19:20:20')
> as.POSIXlt(dts)
[1] "2005-10-21 18:24:24 CST" "2017-02-16 19:20:20 CST"
format 參數:
%a : Mon;
%A: Monday
%b: jan
%B: July
u strptime()函數:輸出日期
u strftime()函數:將日期進行格式化
n julian()函數:計算日期之差
> julian(as.Date('2017-02-16')) #默認是1970-01-01
[1] 17213
attr(,"origin")
[1] "1970-01-01"
origin 可以更改時間起點
> julian(as.Date('2017-02-16'),origin = as.Date('2016-02-19'))
[1] 363
attr(,"origin")
[1] "2016-02-19"
n difftime()函數:計算時間之差
> difftime(as.Date('2017-02-16'), as.Date('2016-02-19'),units = 'weeks')
Time difference of 51.85714 weeks
n mean() :計算日期的均值
> mean(c(as.Date('2017-02-16'), as.Date('2016-02-19')))
[1] "2016-08-18"
n seq()函數生成時間序列:
> dateline <- seq(as.Date('2016-02-19'),by = 'days',length.out = 10)
> dateline
[1] "2016-02-19" "2016-02-20" "2016-02-21" "2016-02-22" "2016-02-23" "2016-02-24"
[7] "2016-02-25" "2016-02-26" "2016-02-27" "2016-02-28"
> dateline <- seq(as.Date('2016-02-19'),by = 'weeks',length.out = 10)
> dateline
[1] "2016-02-19" "2016-02-26" "2016-03-04" "2016-03-11" "2016-03-18" "2016-03-25"
[7] "2016-04-01" "2016-04-08" "2016-04-15" "2016-04-22"
> dateline <- seq(as.Date('2016-02-19'),by = '2 days',length.out = 10)
> dateline
[1] "2016-02-19" "2016-02-21" "2016-02-23" "2016-02-25" "2016-02-27" "2016-02-29"
[7] "2016-03-02" "2016-03-04" "2016-03-06" "2016-03-08"
n stringi包中的時間生成函數:stri_datetime_add()函數
> library(stringi)
> my_newtime <- stri_datetime_add(as.Date('2017-02-16'),value = 10,units ='days') #value= 10 指10天后,因為units設置的為天
> my_newtime
[1] "2017-02-26 08:00:00 CST"
n 創建時間(也用stringi包):stri_datetime_create()
stri_datetime_create(2014,4,20)
n 時間日期變量解析函數:將字符串轉化成特定格式的日期時間變量;或者反過來:stri_datetime_parse()函數
> stri_datetime_parse(c('2015-02-27','2015-02-28'),'yyyy-MM-dd')
[1] "2015-02-27 21:28:56 CST" "2015-02-28 21:28:56 CST"
> stri_datetime_parse(c('2015-02-27','2015-02-30'),'yyyy-MM-dd')
[1] "2015-02-27 21:29:51 CST" NA #去掉錯誤日期
> stri_datetime_parse(c('2015-02-27','2015-02-30'),'yyyy-MM-dd',lenient = T)
[1] "2015-02-27 21:30:29 CST" "2015-03-02 21:30:29 CST" #自動校正
> stri_datetime_parse('2016-6-6','yyyy-M-d')
[1] "2016-06-06 09:02:23 CST"
lubridate包
n ymd()函數:生成時間變量的函數,順序為年月日
> ymd('20170204')
[1] "2017-02-04"
> ymd('020217')
[1] "2002-02-17"
> x <- c('2009s01s01','2009-02-12','2009 01 30','200814','09.1.1','abd 09 12 08','!!09 ## 12 $$ 12')
> ymd(x)
[1] "2009-01-01" "2009-02-12" "2009-01-30" "2020-08-14" #可用分隔符 2008,1,4 即可讀出來2008年
[5] "2009-01-01" "2009-12-08" "2009-12-12"
類似函數:mdy() 月日年,ymd_hms 時分秒
> ymd_hms('20170219142323')
[1] "2017-02-19 14:23:23 UTC"
n 提取日期變量中的子集,如提取時間
> x_time <-ymd(x)
> month(x_time,label = T)
[1] 1月 2月 1月 8月 1月 12月 12月
12 Levels: 1月 < 2月 < 3月 < 4月 < 5月 < 6月 < ... < 12月
> month(x_time,label = F)
[1] 1 2 1 8 1 12 12
> month(x_time,label = T,abbr = F)
[1] 一月 二月 一月 八月 一月 十二月 十二月
12 Levels: 一月 < 二月 < 三月 < 四月 < 五月 < ... < 十二月
> day(x_time)
[1] 1 12 30 14 1 8 12
> mday(x_time)
[1] 1 12 30 14 1 8 12 #在一個月中是第幾天
> wday(x_time) #在一個星期中是第幾天
[1] 5 5 6 6 5 3 7
n 修改日期:
> new_date <- now()
> new_date
[1] "2018-08-08 09:26:07 CST"
>
> day(new_date) <- 17
> new_date
[1] "2018-08-17 09:26:07 CST"
>
> month(new_date) <- 12
> new_date
[1] "2018-12-17 09:26:07 CST"
n 生成日期;make_date()函數
> dates <- make_date(year = 2010:2016,month = 1:3,day = 1:5)
> dates
[1] "2010-01-01" "2011-02-02" "2012-03-03" "2013-01-04"
[5] "2014-02-05" "2015-03-01" "2016-01-02"
n 添加時間信息: make_datetime()函數,類似make_date()函數,但是多了時分秒的信息
n 計算時間的近似值:
> x_time <- as.POSIXct('2009-11-12 12:01:59')
> round_date(x_time,unit = 'minute') #unit可改成hour,month,half year等
[1] "2009-11-12 12:02:00 CST"
n 殘缺日期的處理:
> time_t <- c('2017-02','201609','2017/5')
> ymd(time_t)
[1] NA NA NA
Warning message:
All formats failed to parse. No formats found.
> ymd(time_t,truncated = 1) #補全 都補的1號
[1] "2017-02-01" "2016-09-01" "2017-05-01"
n 如何進行時間變量的操作:如加減乘除 先用interval(),再用time_length()函數
> inta <- interval(start =ymd('1900,01,01'),end = ymd('1999,1231'))
> time_length(inta,unit = 'year') # year也可以改成day
[1] 99.99726
> x <- as.POSIXlt('2017-02-03')
> x + days(10) + hours(12) + minutes(30)
[1] "2017-02-13 12:30:00 UTC"
時間序列
n 生成時間序列:ts()函數
> ts(1:10,frequency = 4,start = c(1998,2)) #10個元素,4個季度,起始季度:1998年第二個季度
Qtr1 Qtr2 Qtr3 Qtr4
1998 1 2 3
1999 4 5 6 7
2000 8 9 10
> ts(1:10,frequency = 12,start = c(1998,2)) #以月為節點,12
Feb Mar Apr May Jun Jul Aug Sep Oct Nov
1998 1 2 3 4 5 6 7 8 9 10
> ts(1:10,frequency = 12,start = c(1998,2),end = c(2001,3)) #起始和終止時間
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998 1 2 3 4 5 6 7 8 9 10 1
1999 2 3 4 5 6 7 8 9 10 1 2 3
2000 4 5 6 7 8 9 10 1 2 3 4 5
2001 6 7 8
n 時間序列可視化:
> value <- ts(data = sample(0:300,366,replace = TRUE),start =
+ as.Date('2016-01-01'),frequency = 1,end = as.Date('2016-12-31'))
> date <- seq(from = as.Date('2016-01-01'),by = 1,length.out = 366)
> df <- data.frame(value = value,time = date)
> head(df)
value time
1 242 2016-01-01
2 59 2016-01-02
3 21 2016-01-03
4 117 2016-01-04
5 162 2016-01-05
6 151 2016-01-06
> plot(ts(cumsum(1+round(rnorm(100),2)),start = c(1954,7),frequency = 12))
> plot(value)
n xts擴展包中的函數:生成時間序列:xts()函數
> value <- sample(0:100,365,replace = T)
> times <- seq(from = as.Date('2017-01-01'),by = 1,length = 365)
>
> myts <- xts(value,times)
> head(myts)
[,1]
2017-01-01 21
2017-01-02 66
2017-01-03 95
2017-01-04 6
2017-01-05 43
2017-01-06 6
> class(myts)
[1] "xts" "zoo"
u 后續操作:window()函數:從時間序列提取子集,同時也具有賦值功能
> window(myts,start = as.Date('2017-01-10'),end = as.Date('2017-02-01'))
[,1]
2017-01-10 30
2017-01-11 77
2017-01-12 70
2017-01-13 93
2017-01-14 57
2017-01-15 32
2017-01-16 100
2017-01-17 7
2017-01-18 13
2017-01-19 36
2017-01-20 52
2017-01-21 97
2017-01-22 43
2017-01-23 94
2017-01-24 50
2017-01-25 84
2017-01-26 79
2017-01-27 57
2017-01-28 86
2017-01-29 16
2017-01-30 55
2017-01-31 87
2017-02-01 71
賦值:
> window(myts,start = as.Date('2017-01-10'),end = as.Date('2017-01-15')) <- 1:6
> window(myts,start = as.Date('2017-01-10'),end = as.Date('2017-01-15'))
[,1]
2017-01-10 1
2017-01-11 2
2017-01-12 3
2017-01-13 4
2017-01-14 5
2017-01-15 6
u lag()函數:計算滯后,把前一項的值賦給后一項
> head(lag(myts)) #滯后的
[,1]
2017-01-01 NA
2017-01-02 21
2017-01-03 66
2017-01-04 95
2017-01-05 6
2017-01-06 43
> head(myts) #原本的
[,1]
2017-01-01 21
2017-01-02 66
2017-01-03 95
2017-01-04 6
2017-01-05 43
2017-01-06 6
u diff()函數:計算離差值:后一項值減去前一項的值
> head(diff(myts)) #計算離差的
[,1]
2017-01-01 NA
2017-01-02 45
2017-01-03 29
2017-01-04 -89
2017-01-05 37
2017-01-06 -37
時間序列分析
n 《時間序列分析與應用》
n 以co2數據集為例
> data('co2')
> class(co2)
[1] "ts"
> head(co2)
[1] 315.42 316.31 316.50 317.56 318.13 318.00
> training <- co2[1:400]
> class(training)
[1] "numeric"
> ts_training <- ts(training,start = start(co2),frequency = frequency(co2)) #轉換成時間序列
> plot(ts_training) #此處可視化
> de_co2 <- decompose(ts_training) #分解
> plot(de_co2) #此處可視化
#建模分析 采用SARIMA模型進行分析
#ARIMA(P 自相關的階數 ,D 差分的階數,Q 滑動平均的階數
# 加載tseries包,運用kpss.test函數判斷序列的平穩性
> training <- co2[1:400]
> ts_training <- ts(training,start = start(co2),frequency = frequency(co2))
> kpss.test(ts_training)
KPSS Test for Level Stationarity
data: ts_training
KPSS Level = 7.9126, Truncation lag parameter = 4,
p-value = 0.01 #說明序列是不平穩的
Warning message:
In kpss.test(ts_training) : p-value smaller than printed p-value
#把序列變平穩,用差分的方法處理
> ts_training_diff <- diff(ts_training)
> head(ts_training_diff)
[1] 0.89 0.19 1.06 0.57 -0.13 -1.61
#再次看序列是否平穩
> kpss.test(ts_training_diff) #kpss.test 確定了D 由於是用了一階差分,所以D為1
KPSS Test for Level Stationarity
data: ts_training_diff
KPSS Level = 0.020359, Truncation lag parameter = 4, p-value =
0.1 #說明序列已經平穩了
Warning message:
In kpss.test(ts_training_diff) : p-value greater than printed p-value
> acf(ts_training) #acf自相關系數,確定P值 p =1
> pacf(ts_training) #偏自相關系數,確定Q值, Q = 1
#模型表達式: SARIMA(1,1,1)*(1,1,1)12 (12為frequency)
> co2_fit <-Arima(ts_training,order = c(1,1,1),
+ seasonal = list(order = c(1,1,1),period =12)) #Arima模型,1,1,1 PDQ
> co2_fore <- forecast(co2_fit,68)
> plot(co2,col= 'red')
> par(new = T)
> plot(co2_fore)