pandas 常用函數


1.主要講的是當index存在重復值的時候, 可以用 obj.index.is_unique 判斷,獲取重復index的值的時候obj['a'],返回的所有重復的index的值。
2.dataframe 常用的算術統計函數,https://chrisalbon.com/python/pandas_dataframe_descriptive_stats.html
函數list 參見, python 數據分析, P139 ,table 5-10
3.import pandas_datareader as web 可以采集股票數據作為統計樣本,支持的web及使用方式,見下表。
https://pandas-datareader.readthedocs.io/en/latest/
(1)series 和 series 
returns.MSFT.corr(returns.IBM) 相關系數
returns.MSFT.cov(returns.IBM) 協方差

(2)frame 自相關
returns.corr()
returns.cov()
(3)frame 和 series 相關
returns.corrwith(returns.IBM)
(4)frame 和 frame 相關
returns.corrwith(volumn)


import numpy as np
from pandas import DataFrame , Series
print ("Axis indexes with duplicate values")
obj=Series(range(5),index =['a','a','b','b','c'])
print("obj is \n", obj)
print("obj.index.is_unique is ",obj.index.is_unique)
print("obj['a'] is \n", obj['a'])
print("obj['b'] is \n",obj['b'])

df=DataFrame(np.random.randn(4,3),index=['a','a','b','b'])
print("df is \n",df)
print("df.ix['b'] is \n ",df.ix['b'])

df = DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]],index=['a', 'b', 'c', 'd'],columns=['one','two'])
print("df is \n",df)
print("Calling dafaframe's sum method returns a Series containing column sums")
print("df.sum() is \n",df.sum())
print("passing axis=1 sums over the rows instead")
print("df.sum(axis=1) \n", df.sum(axis=1))
print("NA values are excluded unless the entire slice is NA.this can be disabled using the skipna option")
print("df.mean(axis=1,skipna=False \n ",df.mean(axis=1,skipna=False))

print("df.idxmax() return indirect statistics like the index value where the maximum values are attained \n",df.idxmax())
print("df.cumsum() return cumulative sum of values \n",df.cumsum())
print("df.describe() return multiple summary statistics in one shot \n",df.describe())
obj=Series(['a','a','b','c']*4)
print("obj is \n",obj)
print("obj.describe() return alternate summary statistics \n",obj.describe())

import pandas_datareader as web

https://pandas-datareader.readthedocs.io/en/latest/

all_data={}
for ticker in ['AAPL','IBM', 'MSFT', 'GOOG']:
all_data[ticker] = web.get_data_google(ticker,'1/1/2016','1/1/2017')
print("all data is \n ", all_data)

price = DataFrame({tic: data['Close']
for tic, data in all_data.items()})
volume = DataFrame({tic: data['Volume']
for tic, data in all_data.items()})

returns = price.pct_change()
print("returns.tail()\n",returns.tail())

print("returns.MSFT.corr(returns.IBM) \n",returns.MSFT.corr(returns.IBM))
print("returns.MSFT.cov(returns.IBM) \n", returns.MSFT.cov(returns.IBM))

print("returns.corr() \n", returns.corr())
print("returns.cov() \n", returns.cov())

print("returns.corrwith(returns.IBM) \n",returns.corrwith(returns.IBM))

print("volumn is \n",volume)
print("returns.corrwith(volumn) \n",returns.corrwith(volume))


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM