1.時間模塊:datetime
datetime模塊,主要掌握:datetime.date(), datetime.datetime(), datetime.timedelta()
日期解析方法:parser.parse
datetime.date:date對象
import datetime #也可以寫成 from datetime import date today = datetime.date.today() print(today, type(today)) #2018-08-21 <class 'datetime.date'> print(str(today), type(str(today)))#2018-08-21 <class 'str'> t = datetime.date(2018, 12, 8) print(t)#2018-12-08
datetime.date.today() 返回今日
輸出格式為 date類
datetime.datetime:datetime對象
now = datetime.datetime.now() print(now, type(now)) #2018-08-21 19:22:47.296548 <class 'datetime.datetime'> print(str(now), type(str(now))) #2018-08-21 19:23:26.139769 <class 'str'> t1 = datetime.datetime(2018, 8, 1) t2 = datetime.datetime(2014, 9, 1, 12, 12, 12) print(t1, t2) #2018-08-01 00:00:00 2014-09-01 12:12:12 print(t1 - t2) #1429 days, 11:47:48
datetime.datetime.now()方法,輸出當前時間
輸出格式為 datetime類
可通過str()轉化為字符串
datetime.timedelta:時間差
today = datetime.datetime.today() yestoday = today - datetime.timedelta(1) #日 print(today, yestoday) #2018-08-21 19:32:25.068595 2018-08-20 19:32:25.068595 print(today - datetime.timedelta(7)) #2018-08-14 19:32:25.068595
datetime.timedelta() 時間差主要用作時間的加減法,相當於可被識別的時間“差值”
parser.parse:日期字符串轉換(parse() 轉換為datetime類型)
from dateutil.parser import parse date = '12-15-2018' t = parse(date) print(t, type(t)) #2018-12-15 00:00:00 <class 'datetime.datetime'> print(parse('2009-1-2'),'\n', #2009-01-02 00:00:00 parse('5/3/2009'),'\n', # 2009-05-03 00:00:00 parse('5/3/2009',dayfirst = True),'\n', # 2009-03-05 00:00:00 # 國際通用格式中,日在月之前,可以通過dayfirst來設置,如果是False就是 2009-05-03 00:00:00 parse('22/1/2014'),'\n', # 2014-01-22 00:00:00 parse('Jan 31, 1997 10:45 PM') # 1997-01-31 22:45:00 )
2.Pandas時刻數據(時間點)
時刻數據代表時間點(可以是一年、一個月、一天、一分鍾、一秒等),是pandas的數據類型,是將值與時間點相關聯的最基本類型的時間序列數據
時間戳(timestamp),一個能表示一份數據在某個特定時間之前已經存在的、 完整的、 可驗證的數據,通常是一個字符序列,唯一地標識某一刻的時間。
pandas.Timestamp()
pd.Timestamp( ) ---> 單個時間戳-創建方式
datetime.datetime(2016, 12, 2, 22, 15, 59) datetime類型 | ‘2018-12-7 12:07:47 ’ 字符串類型 只能是單個時間數據
import numpy as np import pandas as pd date1 = datetime.datetime(2016,12,1,12,45,30) #它是datetime類型 date2 = '2018-11-18' #‘20181118’、‘2/3/2018’、‘2018-11-18 12:08:13’等這些字符串都是可以識別的 t1 = pd.Timestamp(date1) t2 = pd.Timestamp(date2) print(t1, type(t1)) #2016-12-01 12:45:30 <class 'pandas._libs.tslibs.timestamps.Timestamp'> print(t2) #2018-11-18 00:00:00 print(pd.Timestamp('2017-12-09 15:09:21')) #2017-12-09 15:09:21
>>> print(date1, type(date1))
2016-12-01 12:45:30 <class 'datetime.datetime'>
直接生成pandas的時刻數據 → 時間戳 數據類型為 pandas的Timestamp
pd.to_datetime -- pd.to_datetime→多個時間數據轉換時間戳索引
pd.to_datetime():如果是單個時間數據,轉換成pandas的時刻數據,數據類型為Timestamp;多個時間數據,將會轉換為pandas的DatetimeIndex
datetime類型和Timestamp類型的區別;
Timestamp和DatetimeIndex的區別;
轉換為pandas時刻數據的兩個方法:直接Timestamp、to_datetime
from datetime import datetime import pandas as pd date1 = datetime(2018, 12, 2, 12, 24, 30) date2 = '2017-07-21' t1 = pd.to_datetime(date1) t2 = pd.to_datetime(date2) print(t1, type(t1)) #2018-12-02 12:24:30 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 單個數據跟Timestamp沒什么區別 print(t2, type(t2)) #2017-07-21 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> lst_date = ['2017-12-9', '2017-10-19', '2018-9-9'] #如果時間是個序列,多個數據,就有區別了 t3 = pd.to_datetime(lst_date) print(t3, type(t3))
#DatetimeIndex(['2017-12-09', '2017-10-19', '2018-09-09'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
pd.to_datetime( data, errors='ignore' | errors='coerce' )
>>> import numpy as np >>> import pandas as pd >>> from datetime import datetime #如果不加這句話就要datetime.datetime
>>> date1 = [datetime(2018, 6, 1), datetime(2018, 7,1), datetime(2018,8,1)] #datetime類型 >>> date2 = ['2017-2-1','2017-2-2','2017-2-3','2017-2-4','2017-2-5','2017-2-6'] #列表
>>> print(date1) [datetime.datetime(2018, 6, 1, 0, 0), datetime.datetime(2018, 7, 1, 0, 0), datetime.datetime(2018, 8, 1, 0, 0)] >>> print(date2) ['2017-2-1', '2017-2-2', '2017-2-3', '2017-2-4', '2017-2-5', '2017-2-6'] >>> t1 = pd.to_datetime(date1) >>> t2 = pd.to_datetime(date2) >>> print(t1) DatetimeIndex(['2018-06-01', '2018-07-01', '2018-08-01'], dtype='datetime64[ns]', freq=None) >>> print(t2) DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', '2017-02-04', '2017-02-05', '2017-02-06'], dtype='datetime64[ns]', freq=None) >>> date3 = ['2017-9-1', '2018-11-10','Hello world!','2018-10-9', '2017-7-1'] >>> t3 = pd.to_datetime(date3, errors='ignore') #加上它就不會去解析它是否是時間序列了 ;當一組時間序列中夾雜其他格式數據時,可用errors參數返回。
#errors = 'ignore':不可解析時返回原始輸入,這里就是直接生成一般數組 >>> print(t3, type(t3)) ['2017-9-1' '2018-11-10' 'Hello world!' '2018-10-9' '2017-7-1'] <class 'numpy.ndarray'> >>> >>> t4 = pd.to_datetime(date3, errors='coerce') #會把不是時間序列的參數給去掉,當做缺失值,但它已經是時間序列了,DatetimeIndex類型
# errors = 'coerce':不可擴展,缺失值返回NaT(Not a Time),結果認為DatetimeIndex >>> print(t4, type(t4)) DatetimeIndex(['2017-09-01', '2018-11-10', 'NaT', '2018-10-09', '2017-07-01'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
3.Pandas時間戳索引
DatetimeIndex
核心:pd.date_range()
3.1 pd.DatetimeIndex() (時間戳索引)與TimeSeries時間序列
pd.DatatimeIndex([多個時間序列])
rng = pd.DatetimeIndex(['12/1/2018', '12/2/2018', '12/3/2018', '12/4/2018'])
pd.Series(np.random.rand(len(rng)),index = rng) #以DatetimeIndex為index的Series,為TimeSeries,時間序列。
>>> rng = pd.DatetimeIndex(['12/1/2018', '12/2/2018', '12/3/2018', '12/4/2018']) #DatetimeIndex這樣一個直接把它變成DatetimeIndex類型的一個方法 >>> print(rng, type(rng)) DatetimeIndex(['2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> print(rng[0], type(rng[0])) 2018-12-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'> >>> # 直接生成時間戳索引,支持str、datetime.datetime ... #rng[0] 單個時間戳為Timestamp, rng[0:3] 多個時間戳為DatetimeIndex >>> st = pd.Series(np.random.rand(len(rng)),index = rng) #以DatetimeIndex為index的Series,為TimeSeries,時間序列。 >>> print(st, type(st)) 2018-12-01 0.063915 2018-12-02 0.726902 2018-12-03 0.135305 2018-12-04 0.237609 dtype: float64 <class 'pandas.core.series.Series'> >>> print(st.index) DatetimeIndex(['2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04'], dtype='datetime64[ns]', freq=None) >>>
3.2 pd.date_range()-日期范圍:生成日期范圍
date_range() 2種生成方式:①start + end; ②start/end + periods
pd.date_range('6/10/2018','10/5/2018') 、 pd.date_range('6/10/2018',periods=10) 、 pd.date_range(end='6/10/2018',periods=10)
默認頻率:day
直接生成DatetimeIndex
# pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
# start:開始時間
# end:結束時間
# periods:偏移量
# freq:頻率,默認天,pd.date_range()默認頻率為日歷日,pd.bdate_range()默認頻率為工作日
# tz:時區
# normalize 默認False,為True時就把時間給你變成00:00:00,但不會顯示出來
#rng1 = pd.date_range('12/1/2018', '4/10/2017', normalize=True) #DatetimeIndex([], dtype='datetime64[ns]', freq='D') <class 'pandas.core.indexes.datetimes.DatetimeIndex'> rng1 = pd.date_range('1/1/2017','1/10/2017', normalize=True) #normalize=True就是把時間給你變成00:00:00,但不會顯示出來 rng2 = pd.date_range(start='1/1/2018', periods=10) #start=也可以不寫的 rng3 = pd.date_range(end='1/30/2017 14:20:00', periods=10) >>> print(rng1, type(rng1)) DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08', '2017-01-09', '2017-01-10'], dtype='datetime64[ns]', freq='D') <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> print(rng2) DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08', '2018-01-09', '2018-01-10'], dtype='datetime64[ns]', freq='D') >>> print(rng3) DatetimeIndex(['2017-01-21 14:20:00', '2017-01-22 14:20:00', '2017-01-23 14:20:00', '2017-01-24 14:20:00', '2017-01-25 14:20:00', '2017-01-26 14:20:00', '2017-01-27 14:20:00', '2017-01-28 14:20:00', '2017-01-29 14:20:00', '2017-01-30 14:20:00'], dtype='datetime64[ns]', freq='D') # pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs) # start:開始時間 # end:結束時間 # periods:偏移量 # freq:頻率,默認天,pd.date_range()默認頻率為日歷日,pd.bdate_range()默認頻率為工作日 # tz:時區 rng4 = pd.date_range(start='1/1/2017 15:30', periods=10, name='Hello world!', normalize=True) #它就會把15:30歸為00:00,它不顯示出來。name就是一個參數。 >>> print(rng4) DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08', '2017-01-09', '2017-01-10'], dtype='datetime64[ns]', name='Hello world!', freq='D') >>> >>> print(pd.date_range('20170101','20170104')) DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D') >>> print(pd.date_range('20170101','20170104',closed='right')) DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D') >>> print(pd.date_range('20170101','20170104',closed='left')) DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='D') >>> >>> print(pd.date_range('20170101','20170107')) DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06'], dtype='datetime64[ns]', freq='B')
>>> print(list(pd.date_range(start='1/1/2017',periods=10)))#由多個時間戳組成的序列 [Timestamp('2017-01-01 00:00:00', freq='D'), Timestamp('2017-01-02 00:00:00', freq='D'), Timestamp('2017-01-03 00:00:00', freq='D'), Timestamp('2017-01-04 00:00:00', freq='D'), Timestamp('2017-01-05 0 0:00:00', freq='D'), Timestamp('2017-01-06 00:00:00', freq='D'), Timestamp('2017-01-07 00:00:00', freq='D'), Timestamp('2017-01-08 00:00:00', freq='D'), Timestamp('2017-01-09 00:00:00', freq='D'), Tim estamp('2017-01-10 00:00:00', freq='D')] >>>
pd.date_range()-日期范圍:freq 頻率(1)
freq = 'B' 、‘H’、T、S、L、U、W-MON、
>>> print(pd.date_range('2017/1/1','2017/1/4')) #默認freq = 'D':每日歷日 DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D') >>> print(pd.date_range('2017/1/1','2017/1/4',freq='B')) # B:每工作日 DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='B') >>> print(pd.date_range('2017/1/1','2017/1/4',freq='H')) # H:每小時 DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 01:00:00', '2017-01-01 02:00:00', '2017-01-01 03:00:00', '2017-01-01 04:00:00', '2017-01-01 05:00:00', '2017-01-01 06:00:00', '2017-01-01 07:00:00', '2017-01-01 08:00:00', '2017-01-01 09:00:00', '2017-01-01 10:00:00', '2017-01-01 11:00:00', '2017-01-01 12:00:00', '2017-01-01 13:00:00', '2017-01-01 14:00:00', '2017-01-01 15:00:00', '2017-01-01 16:00:00', '2017-01-01 17:00:00', '2017-01-01 18:00:00', '2017-01-01 19:00:00', '2017-01-01 20:00:00', '2017-01-01 21:00:00', '2017-01-01 22:00:00', '2017-01-01 23:00:00', '2017-01-02 00:00:00', '2017-01-02 01:00:00', '2017-01-02 02:00:00', '2017-01-02 03:00:00', '2017-01-02 04:00:00', '2017-01-02 05:00:00', '2017-01-02 06:00:00', '2017-01-02 07:00:00', '2017-01-02 08:00:00', '2017-01-02 09:00:00', '2017-01-02 10:00:00', '2017-01-02 11:00:00', '2017-01-02 12:00:00', '2017-01-02 13:00:00', '2017-01-02 14:00:00', '2017-01-02 15:00:00', '2017-01-02 16:00:00', '2017-01-02 17:00:00', '2017-01-02 18:00:00', '2017-01-02 19:00:00', '2017-01-02 20:00:00', '2017-01-02 21:00:00', '2017-01-02 22:00:00', '2017-01-02 23:00:00', '2017-01-03 00:00:00', '2017-01-03 01:00:00', '2017-01-03 02:00:00', '2017-01-03 03:00:00', '2017-01-03 04:00:00', '2017-01-03 05:00:00', '2017-01-03 06:00:00', '2017-01-03 07:00:00', '2017-01-03 08:00:00', '2017-01-03 09:00:00', '2017-01-03 10:00:00', '2017-01-03 11:00:00', '2017-01-03 12:00:00', '2017-01-03 13:00:00', '2017-01-03 14:00:00', '2017-01-03 15:00:00', '2017-01-03 16:00:00', '2017-01-03 17:00:00', '2017-01-03 18:00:00', '2017-01-03 19:00:00', '2017-01-03 20:00:00', '2017-01-03 21:00:00', '2017-01-03 22:00:00', '2017-01-03 23:00:00', '2017-01-04 00:00:00'], dtype='datetime64[ns]', freq='H') >>> print(pd.date_range('2017/1/1 12:00','2017/1/1 12:10',freq='T')) # T/MIN:每分 DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:01:00', '2017-01-01 12:02:00', '2017-01-01 12:03:00', '2017-01-01 12:04:00', '2017-01-01 12:05:00', '2017-01-01 12:06:00', '2017-01-01 12:07:00', '2017-01-01 12:08:00', '2017-01-01 12:09:00', '2017-01-01 12:10:00'], dtype='datetime64[ns]', freq='T') >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='S')) # S:每秒 DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:00:01', '2017-01-01 12:00:02', '2017-01-01 12:00:03', '2017-01-01 12:00:04', '2017-01-01 12:00:05', '2017-01-01 12:00:06', '2017-01-01 12:00:07', '2017-01-01 12:00:08', '2017-01-01 12:00:09', '2017-01-01 12:00:10'], dtype='datetime64[ns]', freq='S') >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='L')) # L:每毫秒(千分之一秒) DatetimeIndex([ '2017-01-01 12:00:00', '2017-01-01 12:00:00.001000', '2017-01-01 12:00:00.002000', '2017-01-01 12:00:00.003000', '2017-01-01 12:00:00.004000', '2017-01-01 12:00:00.005000', '2017-01-01 12:00:00.006000', '2017-01-01 12:00:00.007000', '2017-01-01 12:00:00.008000', '2017-01-01 12:00:00.009000', ... '2017-01-01 12:00:09.991000', '2017-01-01 12:00:09.992000', '2017-01-01 12:00:09.993000', '2017-01-01 12:00:09.994000', '2017-01-01 12:00:09.995000', '2017-01-01 12:00:09.996000', '2017-01-01 12:00:09.997000', '2017-01-01 12:00:09.998000', '2017-01-01 12:00:09.999000', '2017-01-01 12:00:10'], dtype='datetime64[ns]', length=10001, freq='L') >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='U')) # U:每微秒(百萬分之一秒) DatetimeIndex([ '2017-01-01 12:00:00', '2017-01-01 12:00:00.000001', '2017-01-01 12:00:00.000002', '2017-01-01 12:00:00.000003', '2017-01-01 12:00:00.000004', '2017-01-01 12:00:00.000005', '2017-01-01 12:00:00.000006', '2017-01-01 12:00:00.000007', '2017-01-01 12:00:00.000008', '2017-01-01 12:00:00.000009', ... '2017-01-01 12:00:09.999991', '2017-01-01 12:00:09.999992', '2017-01-01 12:00:09.999993', '2017-01-01 12:00:09.999994', '2017-01-01 12:00:09.999995', '2017-01-01 12:00:09.999996', '2017-01-01 12:00:09.999997', '2017-01-01 12:00:09.999998', '2017-01-01 12:00:09.999999', '2017-01-01 12:00:10'], dtype='datetime64[ns]', length=10000001, freq='U') >>> print(pd.date_range('2017/1/1','2017/2/1',freq='W-MON')) #W-MON:從指定星期幾開始算起,每周 星期幾縮寫:MON/TUE/WED/THU/FRI/SAT/SUN DatetimeIndex(['2017-01-02', '2017-01-09', '2017-01-16', '2017-01-23', '2017-01-30'], dtype='datetime64[ns]', freq='W-MON') >>> print(pd.date_range('2017/1/1','2017/5/1',freq='WOM-2MON')) # WOM-2MON:每月的第幾個星期幾開始算,這里是每月第二個星期一 DatetimeIndex(['2017-01-09', '2017-02-13', '2017-03-13', '2017-04-10'], dtype='datetime64[ns]', freq='WOM-2MON') >>>
pd.date_range()-日期范圍:freq 頻率(2)
freq = 'M'、'Q-DEC'、‘A-DEC’、‘BM’、‘BQ-DEC’、‘BA-DEC’ 、'MS' 、‘QS-DEC’、‘AS-DEC’、‘BMS’、‘BQS-DEC’ 、‘BAS-DEC’
##########某個時刻的最后一個日歷日
>>> print(pd.date_range('2017','2018',freq='M')) # M:每月最后一個日歷日 DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'], dtype='datetime64[ns]', freq='M') >>> print(pd.date_range('2017','2020',freq='Q-DEC')) # Q-月:指定月為季度末,每個季度末最后一月的最后一個日歷日 所以Q-月只有三種情況:1-4-7-10,2-5-8-11,3-6-9-12 DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-30', '2017-12-31', '2018-03-31', '2018-06-30', '2018-09-30', '2018-12-31', '2019-03-31', '2019-06-30', '2019-09-30', '2019-12-31'], dtype='datetime64[ns]', freq='Q-DEC') >>> print(pd.date_range('2017','2020',freq='A-DEC')) # A-月:每年指定月份的最后一個日歷日 # 月縮寫:JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC DatetimeIndex(['2017-12-31', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='A-DEC') >>>#################某個時刻的最后工作日 >>> print(pd.date_range('2017','2020',freq='BM')) # BM:每月最后一個工作日 DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-28', '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-29', '2017-10-31', '2017-11-30', '2017-12-29', '2018-01-31', '2018-02-28', '2018-03-30', '2018-04-30', '2018-05-31', '2018-06-29', '2018-07-31', '2018-08-31', '2018-09-28', '2018-10-31', '2018-11-30', '2018-12-31', '2019-01-31', '2019-02-28', '2019-03-29', '2019-04-30', '2019-05-31', '2019-06-28', '2019-07-31', '2019-08-30', '2019-09-30', '2019-10-31', '2019-11-29', '2019-12-31'], dtype='datetime64[ns]', freq='BM') >>> print(pd.date_range('2017','2020',freq='BQ-DEC')) # BQ-月:指定月為季度末,每個季度末最后一月的最后一個工作日 DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30', '2018-06-29', '2018-09-28', '2018-12-31', '2019-03-29', '2019-06-28', '2019-09-30', '2019-12-31'], dtype='datetime64[ns]', freq='BQ-DEC') >>> print(pd.date_range('2017','2020',freq='BA-DEC')) # BA-月:每年指定月份的最后一個工作日 DatetimeIndex(['2017-12-29', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='BA-DEC') >>> ################某個時刻的第一個日歷日 >>> print(pd.date_range('2017','2018',freq='MS')) # M:每月第一個日歷日 DatetimeIndex(['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01', '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01', '2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='MS') >>> print(pd.date_range('2017','2020',freq='QS-DEC')) # Q-月:指定月為季度末,每個季度末最后一月的第一個日歷日 DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01', '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01', '2019-06-01', '2019-09-01', '2019-12-01'], dtype='datetime64[ns]', freq='QS-DEC') >>> print(pd.date_range('2017','2020',freq='AS-DEC')) # A-月:每年指定月份的第一個日歷日 DatetimeIndex(['2017-12-01', '2018-12-01', '2019-12-01'], dtype='datetime64[ns]', freq='AS-DEC') >>>##############某個時刻的第一個日歷日 >>> print(pd.date_range('2017','2018',freq='BMS')) # BM:每月第一個工作日 DatetimeIndex(['2017-01-02', '2017-02-01', '2017-03-01', '2017-04-03', '2017-05-01', '2017-06-01', '2017-07-03', '2017-08-01', '2017-09-01', '2017-10-02', '2017-11-01', '2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='BMS') >>> print(pd.date_range('2017','2020',freq='BQS-DEC')) # BQ-月:指定月為季度末,每個季度末最后一月的第一個工作日 DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01', '2018-06-01', '2018-09-03', '2018-12-03', '2019-03-01', '2019-06-03', '2019-09-02', '2019-12-02'], dtype='datetime64[ns]', freq='BQS-DEC') >>> print(pd.date_range('2017','2020',freq='BAS-DEC')) # BA-月:每年指定月份的第一個工作日 DatetimeIndex(['2017-12-01', '2018-12-03', '2019-12-02'], dtype='datetime64[ns]', freq='BAS-DEC') >>>
pd.date_range()-日期范圍:freq 復合頻率
freq = '7D' 、‘2M’ 、‘2h30min’
>>> print(pd.date_range('2017/1/1','2017/2/1',freq='7D')) # 7天 DatetimeIndex(['2017-01-01', '2017-01-08', '2017-01-15', '2017-01-22', '2017-01-29'], dtype='datetime64[ns]', freq='7D') >>> print(pd.date_range('2017/1/1','2017/1/2',freq='2h30min')) # 2小時30分鍾 DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 02:30:00', '2017-01-01 05:00:00', '2017-01-01 07:30:00', '2017-01-01 10:00:00', '2017-01-01 12:30:00', '2017-01-01 15:00:00', '2017-01-01 17:30:00', '2017-01-01 20:00:00', '2017-01-01 22:30:00'], dtype='datetime64[ns]', freq='150T') >>> print(pd.date_range('2017','2018',freq='2M')) # 2月,每月最后一個日歷日 DatetimeIndex(['2017-01-31', '2017-03-31', '2017-05-31', '2017-07-31', '2017-09-30', '2017-11-30'], dtype='datetime64[ns]', freq='2M') >>>
asfreq:時期頻率轉換
ts.asfreq('4H', method='ffill')
>>> ts = pd.Series(np.random.rand(4),index=pd.date_range('20170101','20170104')) >>> print(ts) 2017-01-01 0.516999 2017-01-02 0.882315 2017-01-03 0.775276 2017-01-04 0.440545 Freq: D, dtype: float64 >>> >>> print(ts.asfreq('4H')) 2017-01-01 00:00:00 0.516999 2017-01-01 04:00:00 NaN 2017-01-01 08:00:00 NaN 2017-01-01 12:00:00 NaN 2017-01-01 16:00:00 NaN 2017-01-01 20:00:00 NaN 2017-01-02 00:00:00 0.882315 2017-01-02 04:00:00 NaN 2017-01-02 08:00:00 NaN 2017-01-02 12:00:00 NaN 2017-01-02 16:00:00 NaN 2017-01-02 20:00:00 NaN 2017-01-03 00:00:00 0.775276 2017-01-03 04:00:00 NaN 2017-01-03 08:00:00 NaN 2017-01-03 12:00:00 NaN 2017-01-03 16:00:00 NaN 2017-01-03 20:00:00 NaN 2017-01-04 00:00:00 0.440545 Freq: 4H, dtype: float64 >>> print(ts.asfreq('4H',method='ffill')) #改變頻率,這里是D改為4H; method:插值模式,None不插值,ffill用之前的值填充,bfill用之后的值填充。 2017-01-01 00:00:00 0.516999 2017-01-01 04:00:00 0.516999 2017-01-01 08:00:00 0.516999 2017-01-01 12:00:00 0.516999 2017-01-01 16:00:00 0.516999 2017-01-01 20:00:00 0.516999 2017-01-02 00:00:00 0.882315 2017-01-02 04:00:00 0.882315 2017-01-02 08:00:00 0.882315 2017-01-02 12:00:00 0.882315 2017-01-02 16:00:00 0.882315 2017-01-02 20:00:00 0.882315 2017-01-03 00:00:00 0.775276 2017-01-03 04:00:00 0.775276 2017-01-03 08:00:00 0.775276 2017-01-03 12:00:00 0.775276 2017-01-03 16:00:00 0.775276 2017-01-03 20:00:00 0.775276 2017-01-04 00:00:00 0.440545 Freq: 4H, dtype: float64 >>> print(ts.asfreq('4H',method='bfill')) 2017-01-01 00:00:00 0.516999 2017-01-01 04:00:00 0.882315 2017-01-01 08:00:00 0.882315 2017-01-01 12:00:00 0.882315 2017-01-01 16:00:00 0.882315 2017-01-01 20:00:00 0.882315 2017-01-02 00:00:00 0.882315 2017-01-02 04:00:00 0.775276 2017-01-02 08:00:00 0.775276 2017-01-02 12:00:00 0.775276 2017-01-02 16:00:00 0.775276 2017-01-02 20:00:00 0.775276 2017-01-03 00:00:00 0.775276 2017-01-03 04:00:00 0.440545 2017-01-03 08:00:00 0.440545 2017-01-03 12:00:00 0.440545 2017-01-03 16:00:00 0.440545 2017-01-03 20:00:00 0.440545 2017-01-04 00:00:00 0.440545 Freq: 4H, dtype: float64
pd.date_range()-日期范圍:超前/ 滯后數據 .shift( )
ts.shift(1) 把昨天的數據移動 ts.shift(1, freq = 'D')對時間戳進行移動而不是數值了
>>> ts = pd.Series(np.random.rand(4),index=pd.date_range('20170101','20170104')) >>> print(ts) 2017-01-01 0.421724 2017-01-02 0.102916 2017-01-03 0.411452 2017-01-04 0.626978 Freq: D, dtype: float64 >>> print(ts.shift(2)) # 正數:數值后移(滯后);負數:數值前移(超前) 2017-01-01 NaN 2017-01-02 NaN 2017-01-03 0.421724 2017-01-04 0.102916 Freq: D, dtype: float64 >>> print(ts.shift(-2)) 2017-01-01 0.411452 2017-01-02 0.626978 2017-01-03 NaN 2017-01-04 NaN Freq: D, dtype: float64 >>> >>> per = ts/ts.shift(1) - 1 #計算變化百分比,這里計算:該時間戳與上一個時間戳相比,變化百分比;ts為今天的數據,ts.shift(1)為昨天的數據,ts/ts.shift(1)為百分比。再-1就是變化百分比了。 >>> print(per) 2017-01-01 NaN 2017-01-02 -0.755963 2017-01-03 2.997923 2017-01-04 0.523818 Freq: D, dtype: float64 >>> >>> print(ts.shift(2,freq='D')) #加上freq參數:對時間戳進行位移,而不是對數值進行位移 2017-01-03 0.421724 2017-01-04 0.102916 2017-01-05 0.411452 2017-01-06 0.626978 Freq: D, dtype: float64 >>> print(ts.shift(2,freq='T')) 2017-01-01 00:02:00 0.421724 2017-01-02 00:02:00 0.102916 2017-01-03 00:02:00 0.411452 2017-01-04 00:02:00 0.626978 Freq: D, dtype: float64 >>>
4.Pandas時期:Period
pd.Period()
核心:pd.Period() ---->時間段、時間構造器; 時間節面、時間戳、每個時期
pd.Period()參數:一個時間戳 + freq 參數 → freq 用於指明該 period 的長度,時間戳則說明該 period 在時間軸上的位置。
pd.Period('2017',freq = 'M') + 1
##pd.Period()創建時期
>>> p = pd.Period('2017',freq = 'M') # 生成一個以2017-01開始,月為頻率的時間構造器 >>> t = pd.DatetimeIndex(['2017-1-1']) >>> print(p, type(p)) 2017-01 <class 'pandas._libs.tslibs.period.Period'> >>> print(t, type(t)) DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> >>> print(p + 1) # 通過加減整數,將周期整體移動 2017-02 >>> print(p - 2) 2016-11 >>> print(pd.Period('2012',freq = 'A-DEC') - 1) #這里是按照 月、年 移動 2011 >>>
pd.period_range() 創建時期范圍
Period('2011', freq = 'A-DEC')可以看成多個時間期的時間段中的游標
pd.Period('2017',freq = 'M') + 1 ;Period()和period_range()是兩種不同的索引方式,一個為時間戳、另外一個為時期。
pd.period_range('1/1/2011', '1/1/2012', freq='M') 、pd.date_range('1/1/2011', '1/1/2012',freq='M')
period_range為PeriodIndex類型包含年月,沒有日哦; date_range為DatetimeIndex類型,包含年月日;
Timestamp、DatetimeIndex都表示一個時間戳,是一個時間截面;Period是一個時期,是一個時間段!!但兩者作為index時區別不大
##period_range()創建時期范圍
>>> prng = pd.period_range('1/1/2011', '1/1/2012', freq='M') #只包含年、月 >>> rng = pd.date_range('1/1/2011', '1/1/2012',freq='M') #包含年、月、日 >>> print(prng, type(prng)) PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', #之前叫DatetimeIndex '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'> >>> print(rng, type(rng)) DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-31'], dtype='datetime64[ns]', freq='M') <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> >>> print(prng[0], type(prng[0])) #數據格式為PeriodIndex,單個數值為Period 2011-01 <class 'pandas._libs.tslibs.period.Period'> >>> >>> ts = pd.Series(np.random.rand(len(prng)),index=prng) #兩者作為index時區別不大 >>> ts2 = pd.Series(np.random.rand(len(rng)),index=rng) >>> print(ts, type(ts)) 2011-01 0.889509 2011-02 0.967148 2011-03 0.579234 2011-04 0.409504 2011-05 0.180216 2011-06 0.004549 2011-07 0.606768 2011-08 0.599321 2011-09 0.281182 2011-10 0.383243 2011-11 0.437894 2011-12 0.099335 2012-01 0.125945 Freq: M, dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts2, type(ts2)) 2011-01-31 0.058635 2011-02-28 0.899287 2011-03-31 0.806039 2011-04-30 0.520745 2011-05-31 0.855713 2011-06-30 0.057417 2011-07-31 0.508203 2011-08-31 0.846018 2011-09-30 0.465259 2011-10-31 0.535451 2011-11-30 0.630897 2011-12-31 0.031109 Freq: M, dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts.index) # 時間序列 PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='period[M]', freq='M') >>> print(ts2.index) DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-31'], dtype='datetime64[ns]', freq='M') >>> >>>
asfreq:頻率轉換
通過p.asfreq( freq, method=None, how=None)方法轉換成別的頻率
>>> p = pd.Period('2017','A-DEC') >>> print(p) 2017 >>> print(p.asfreq('M',how = 'start')) #也可以寫成how = 's' 2017-01 >>> print(p.asfreq('D',how = 'end')) #也可以寫成how = 'e' 2017-12-31 >>> >>> prng = pd.period_range('2017', '2018', freq='M') >>> ts1 = pd.Series(np.random.rand(len(prng)),index=prng) >>> print(ts1.head(), len(ts1)) 2017-01 0.061827 2017-02 0.138509 2017-03 0.862916 2017-04 0.226967 2017-05 0.910585 Freq: M, dtype: float64 13 >>> ts2 = pd.Series(np.random.rand(len(prng)),index=prng.asfreq('D',how = 'start')) asfreq也可以轉換為TimeSeries的index >>> print(ts2.head(), len(ts2)) 2017-01-01 0.476774 2017-02-01 0.625230 2017-03-01 0.281017 2017-04-01 0.165561 2017-05-01 0.429782 Freq: D, dtype: float64 13
時間戳與時期之間的轉換:pd.to_period()、pd.to_timestamp()
ts.to_period() 轉化為每月最后一日; ts.timestamp() 轉化為每月第一日
rng.to_period() 將 原來的DatetimeIndex轉化為PeriodIndex; prng.to_timestamp() 將PeriodIndex轉化為DatetimeIndex
>>> rng = pd.date_range('2017/1/1',periods = 10, freq = 'M') >>> prng = pd.period_range('2017','2018',freq = 'M') >>> ts1 = pd.Series(np.random.rand(len(rng)),index=rng) >>> print(ts1.head()) 2017-01-31 0.735182 2017-02-28 0.791190 2017-03-31 0.366768 2017-04-30 0.316335 2017-05-31 0.909333 Freq: M, dtype: float64 >>> print(ts1.to_period().head()) # 每月最后一日,轉化為每月 2017-01 0.735182 2017-02 0.791190 2017-03 0.366768 2017-04 0.316335 2017-05 0.909333 Freq: M, dtype: float64 >>> >>> ts1 = pd.Series(np.random.rand(len(prng)),index=prng) >>> print(ts2.head()) 2017-01-01 0.476774 2017-02-01 0.625230 2017-03-01 0.281017 2017-04-01 0.165561 2017-05-01 0.429782 Freq: D, dtype: float64 >>> print(ts2.to_timestamp().head()) #每月,轉化為每月第一天 2017-01-01 0.476774 2017-02-01 0.625230 2017-03-01 0.281017 2017-04-01 0.165561 2017-05-01 0.429782 Freq: MS, dtype: float64 >>>
5.時間序列TimeSeries - 索引及切片
TimeSeries是Series的一個子類,所以Series索引及數據選取方面的方法基本一樣
同時TimeSeries通過時間序列有更便捷的方法做索引和切片
pd.Series(np.random.rand(len(pd.period_range('1/1/2011', '1/1/2012'))),index=(pd.period_range('1/1/2011', '1/1/2012')))
pd.Series(np.random.rand(len(pd.date_range('2017/1','2017/3'))),index=(pd.date_range('2017/1','2017/3')))
索引 ts[0] ts[:2]下標位置索引 ts[ '2017/1/2' ]時間序列標簽索引
>>> rng = pd.date_range('2017/1','2017/3') >>> ts = pd.Series(np.random.rand(len(rng)),index=rng) >>> print(ts.head()) 2017-01-01 0.407246 2017-01-02 0.104561 2017-01-03 0.140087 2017-01-04 0.988668 2017-01-05 0.733602 Freq: D, dtype: float64 >>> print(ts[0]) 0.40724601715639686 >>> print(ts[:2]) # 基本下標位置索引,末端取不到 2017-01-01 0.407246 2017-01-02 0.104561 Freq: D, dtype: float64 >>> >>> print(ts['2017/1/2']) 0.10456068527347884 >>> print(ts['20170103']) 0.14008702206007018 >>> print(ts['1/10/2017']) 0.7621543091477885 >>> print(ts[datetime(2017,1,20)]) # 時間序列標簽索引,支持各種時間字符串,以及datetime.datetime 0.8743928943800818 >>>
時間序列由於按照時間先后排序,故不用考慮順序問題
索引方法同樣適用於Dataframe
切片 ts['2017/1/5: 2017/1/10' ]按照index索引原理,末端包含哦
>>> rng = pd.date_range('2017/1','2017/3',freq = '12H') >>> ts = pd.Series(np.random.rand(len(rng)), index = rng) >>> print(ts['2017/1/5':'2017/1/10']) # 和Series按照index索引原理一樣 ,也是末端包含; 也可以加 ts.loc['2017/1/5':'2017/1/10'] 2017-01-05 00:00:00 0.864954 2017-01-05 12:00:00 0.270408 2017-01-06 00:00:00 0.979987 2017-01-06 12:00:00 0.426279 2017-01-07 00:00:00 0.403995 2017-01-07 12:00:00 0.731792 2017-01-08 00:00:00 0.018432 2017-01-08 12:00:00 0.728155 2017-01-09 00:00:00 0.190817 2017-01-09 12:00:00 0.501240 2017-01-10 00:00:00 0.893398 2017-01-10 12:00:00 0.977586 Freq: 12H, dtype: float64 >>> >>> print(ts['2017/2'].head()) # 傳入月,直接得到一個切片; print(ts['1/2017'] 會把1月給你全部顯示出來 可以直接切片.[::2] 2017-02-01 00:00:00 0.635405 2017-02-01 12:00:00 0.282502 2017-02-02 00:00:00 0.774583 2017-02-02 12:00:00 0.306548 2017-02-03 00:00:00 0.817818 Freq: 12H, dtype: float64 >>>
重復索引的時間序列
ts.is_unique 如果values值唯一,但index值不唯一,同樣也會返回True;
>>> dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015']) >>> ts = pd.Series(np.random.rand(6), index = dates) >>> print(ts) 2015-01-01 0.943037 2015-01-02 0.426762 2015-01-03 0.838297 2015-01-04 0.963703 2015-01-01 0.080439 2015-01-02 0.997752 dtype: float64 >>> print(ts.is_unique,ts.index.is_unique) # index有重復,values沒有重復的; is_unique是檢查 → values唯一,index不唯一就返回True。 True False >>> print(ts['20150101'],type(ts['20150101'])) # index有重復的將返回多個值 2015-01-01 0.943037 2015-01-01 0.080439 dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts['20150104'],type(ts['20150104'])) 2015-01-04 0.963703 dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts.groupby(level = 0).mean()) # 通過groupby做分組,重復的值這里用平均值處理 2015-01-01 0.511738 2015-01-02 0.712257 2015-01-03 0.838297 2015-01-04 0.963703 dtype: float64 >>>
6.時間序列 - 重采樣
從一個頻率轉化為另外一個頻率,而且會有數據的聚合
將時間序列從一個頻率轉換為另一個頻率的過程,且會有數據的結合
降采樣:高頻數據 → 低頻數據,eg.以天為頻率的數據轉為以月為頻率的數據
升采樣:低頻數據 → 高頻數據,eg.以年為頻率的數據轉為以月為頻率的數據
重采樣:.resample()
創建一個以天為頻率的TimeSeries,重采樣為按2天為頻率
ts.resample('2D').sum() / .mean() /.max() / .min() / .median() / .first() / .last() / .ohlc()
>>> rng = pd.date_range('20170101', periods = 12) >>> ts = pd.Series(np.arange(12), index = rng) >>> print(ts) 2017-01-01 0 2017-01-02 1 2017-01-03 2 2017-01-04 3 2017-01-05 4 2017-01-06 5 2017-01-07 6 2017-01-08 7 2017-01-09 8 2017-01-10 9 2017-01-11 10 2017-01-12 11 Freq: D, dtype: int32 >>> ts_re = ts.resample('5D') #按照5天做一個重采樣 ts.resample('5D'): 得到一個重采樣構建器,頻率改為5天 freq:重采樣頻率 → ts.resample('5D') >>> ts_re2 = ts.resample('5D').sum() #做聚合,加個sum() ts.resample('5D').sum():得到一個新的聚合后的Series,聚合方式為求和 .sum():聚合方法 >>> print(ts_re, type(ts_re)) #得到的是一個構建器,並不是一個值 DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0] <class 'pandas.core.resample.DatetimeIndexResampler'> >>> print(ts_re2, type(ts_re2)) 2017-01-01 10 2017-01-06 35 2017-01-11 21 dtype: int32 <class 'pandas.core.series.Series'> >>> print(ts.resample('5D').mean(),'→ 求平均值\n') 2017-01-01 2.0 2017-01-06 7.0 2017-01-11 10.5 dtype: float64 → 求平均值 >>> print(ts.resample('5D').max(),'→ 求最大值\n') 2017-01-01 4 2017-01-06 9 2017-01-11 11 dtype: int32 → 求最大值 >>> print(ts.resample('5D').min(),'→ 求最小值\n') 2017-01-01 0 2017-01-06 5 2017-01-11 10 dtype: int32 → 求最小值 >>> print(ts.resample('5D').median(),'→ 求中值\n') 2017-01-01 2.0 2017-01-06 7.0 2017-01-11 10.5 dtype: float64 → 求中值 >>> print(ts.resample('5D').first(),'→ 返回第一個值\n') 2017-01-01 0 2017-01-06 5 2017-01-11 10 dtype: int32 → 返回第一個值 >>> print(ts.resample('5D').last(),'→ 返回最后一個值\n') 2017-01-01 4 2017-01-06 9 2017-01-11 11 dtype: int32 → 返回最后一個值 >>> print(ts.resample('5D').ohlc(),'→ OHLC重采樣\n') # OHLC:金融領域的時間序列聚合方式 → open開盤、high最大值、low最小值、close收盤 open high low close 2017-01-01 0 4 0 4 2017-01-06 5 9 5 9 2017-01-11 10 11 10 11 → OHLC重采樣
降采樣
ts.resample('5D', closed = 'left').sum() , #closed='left'為默認值也可以不寫; left指定間隔左邊為結束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
ts.resample('5D', closed = 'right').sum(), #closed='right' right指定間隔右邊為結束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]
>>> rng = pd.date_range('20170101', periods = 12) >>> ts = pd.Series(np.arange(1,13), index = rng) >>> print(ts) 2017-01-01 1 2017-01-02 2 2017-01-03 3 2017-01-04 4 2017-01-05 5 2017-01-06 6 2017-01-07 7 2017-01-08 8 2017-01-09 9 2017-01-10 10 2017-01-11 11 2017-01-12 12 Freq: D, dtype: int32 >>> print(ts.resample('5D').sum(),'→ 默認\n') # 詳解:這里values為0-11,按照5D重采樣 → [1,2,3,4,5],[6,7,8,9,10],[11,12] 2017-01-01 15 2017-01-06 40 2017-01-11 23 dtype: int32 → 默認 # closed:各時間段哪一端是閉合(即包含)的,默認 左閉右閉 >>> print(ts.resample('5D', closed = 'left').sum(),'→ left\n') # left指定間隔左邊為結束 → [1,2,3,4,5],[6,7,8,9,10],[11,12] 2017-01-01 15 2017-01-06 40 2017-01-11 23 dtype: int32 → left >>> print(ts.resample('5D', closed = 'right').sum(),'→ right\n') # right指定間隔右邊為結束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12] 2016-12-27 1 2017-01-01 20 2017-01-06 45 2017-01-11 12 dtype: int32 → right >>> print(ts.resample('5D', label = 'left').sum(),'→ leftlabel\n') # label:聚合值的index,默認為分組之后的取左 # 值采樣認為默認(這里closed默認) 2017-01-01 15 2017-01-06 40 2017-01-11 23 dtype: int32 → leftlabel >>> print(ts.resample('5D', label = 'right').sum(),'→ rightlabel\n') #index標簽取重采樣之后的那個2017-01-06,left是默認的取2017-01-01 2017-01-06 15 2017-01-11 40 2017-01-16 23 dtype: int32 → rightlabel >>>
升采樣及插值
ts.resample('15T').asfreq() 低頻轉高頻, .asfreq():不做填充,返回Nan; .ffill():向上填充 ; .bfill():向下填充
>>> rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H') >>> ts = pd.DataFrame(np.arange(15).reshape(5,3), ... index = rng, ... columns = ['a','b','c']) >>> print(ts) a b c 2017-01-01 00:00:00 0 1 2 2017-01-01 01:00:00 3 4 5 2017-01-01 02:00:00 6 7 8 2017-01-01 03:00:00 9 10 11 2017-01-01 04:00:00 12 13 14 >>> print(ts.resample('15T').asfreq()) # 低頻轉高頻,主要是如何插值 # .asfreq():不做填充,返回Nan a b c 2017-01-01 00:00:00 0.0 1.0 2.0 2017-01-01 00:15:00 NaN NaN NaN 2017-01-01 00:30:00 NaN NaN NaN 2017-01-01 00:45:00 NaN NaN NaN 2017-01-01 01:00:00 3.0 4.0 5.0 2017-01-01 01:15:00 NaN NaN NaN 2017-01-01 01:30:00 NaN NaN NaN 2017-01-01 01:45:00 NaN NaN NaN 2017-01-01 02:00:00 6.0 7.0 8.0 2017-01-01 02:15:00 NaN NaN NaN 2017-01-01 02:30:00 NaN NaN NaN 2017-01-01 02:45:00 NaN NaN NaN 2017-01-01 03:00:00 9.0 10.0 11.0 2017-01-01 03:15:00 NaN NaN NaN 2017-01-01 03:30:00 NaN NaN NaN 2017-01-01 03:45:00 NaN NaN NaN 2017-01-01 04:00:00 12.0 13.0 14.0 >>> print(ts.resample('15T').ffill()) # .ffill():向上填充 a b c 2017-01-01 00:00:00 0 1 2 2017-01-01 00:15:00 0 1 2 2017-01-01 00:30:00 0 1 2 2017-01-01 00:45:00 0 1 2 2017-01-01 01:00:00 3 4 5 2017-01-01 01:15:00 3 4 5 2017-01-01 01:30:00 3 4 5 2017-01-01 01:45:00 3 4 5 2017-01-01 02:00:00 6 7 8 2017-01-01 02:15:00 6 7 8 2017-01-01 02:30:00 6 7 8 2017-01-01 02:45:00 6 7 8 2017-01-01 03:00:00 9 10 11 2017-01-01 03:15:00 9 10 11 2017-01-01 03:30:00 9 10 11 2017-01-01 03:45:00 9 10 11 2017-01-01 04:00:00 12 13 14 >>> print(ts.resample('15T').bfill()) # .bfill():向下填充 a b c 2017-01-01 00:00:00 0 1 2 2017-01-01 00:15:00 3 4 5 2017-01-01 00:30:00 3 4 5 2017-01-01 00:45:00 3 4 5 2017-01-01 01:00:00 3 4 5 2017-01-01 01:15:00 6 7 8 2017-01-01 01:30:00 6 7 8 2017-01-01 01:45:00 6 7 8 2017-01-01 02:00:00 6 7 8 2017-01-01 02:15:00 9 10 11 2017-01-01 02:30:00 9 10 11 2017-01-01 02:45:00 9 10 11 2017-01-01 03:00:00 9 10 11 2017-01-01 03:15:00 12 13 14 2017-01-01 03:30:00 12 13 14 2017-01-01 03:45:00 12 13 14 2017-01-01 04:00:00 12 13 14 >>>
時期重采樣 - Period
>>> prng = pd.period_range('2016','2017',freq = 'M') >>> ts = pd.Series(np.arange(len(prng)), index = prng) >>> print(ts) 2016-01 0 2016-02 1 2016-03 2 2016-04 3 2016-05 4 2016-06 5 2016-07 6 2016-08 7 2016-09 8 2016-10 9 2016-11 10 2016-12 11 2017-01 12 Freq: M, dtype: int32 >>> print(ts.resample('3M').sum()) #降采樣 2016-01-31 0 2016-04-30 6 2016-07-31 15 2016-10-31 24 2017-01-31 33 Freq: 3M, dtype: int32 >>> print(ts.resample('15D').ffill()) # 升采樣 2016-01-01 0 2016-01-16 0 2016-01-31 0 2016-02-15 1 2016-03-01 2 2016-03-16 2 2016-03-31 2 2016-04-15 3 2016-04-30 3 2016-05-15 4 2016-05-30 4 2016-06-14 5 2016-06-29 5 2016-07-14 6 2016-07-29 6 2016-08-13 7 2016-08-28 7 2016-09-12 8 2016-09-27 8 2016-10-12 9 2016-10-27 9 2016-11-11 10 2016-11-26 10 2016-12-11 11 2016-12-26 11 2017-01-10 12 2017-01-25 12 Freq: 15D, dtype: int32 >>>