要做量化投資,數據是基礎,正所謂“巧婦難為無米之炊”
在免費數據方面,各大網站的財經板塊其實已提供相應的api,如新浪、雅虎、搜狐。。。可以通過urlopen相應格式的網址獲取數據
而TuShare正是這么一個免費、開源的python財經數據接口包,已將各類數據整理為dataframe類型供我們使用。
主要用到的函數:
1.實時行情獲取
tushare.get_today_all()
一次性獲取當前交易所有股票的行情數據(如果是節假日,即為上一交易日,結果顯示速度取決於網速)
2.歷史數據獲取
tushare.get_hist_data(code, start, end,ktype, retry_count,pause)
參數說明:
- code:股票代碼,即6位數字代碼,或者指數代碼(sh=上證指數 sz=深圳成指 hs300=滬深300指數 sz50=上證50 zxb=中小板 cyb=創業板)
- start:開始日期,格式YYYY-MM-DD
- end:結束日期,格式YYYY-MM-DD
- ktype:數據類型,D=日k線 W=周 M=月 5=5分鍾 15=15分鍾 30=30分鍾 60=60分鍾,默認為D
- retry_count:當網絡異常后重試次數,默認為3
- pause:重試時停頓秒數,默認為0
具體可參考官網http://tushare.org/index.html
而如果要進行完備詳細的回測,每次在線獲取數據無疑效率偏低,因此還需要入庫
下面是數據庫設計部分
表1:stocks
股票表,第一列為股票代碼,第二列為名稱,如果get_today_all()中存在的股票stocks表中沒有,則插入之。
表2:hdata_date
日線表,由於分鍾線只能獲取一周內的數據,我們先對日線進行研究。
字段和get_hist_data返回值基本一致,多了stock_code列,並將record_date列本來是dataframe的index
stock_code,record_date, //主鍵
open,high,close,low, //開盤,最高,收盤,最低
volume, //成交量
price_change,p_change, //價差,漲幅
ma5,ma10,ma20 //k日收盤均價
v_ma5,v_ma10,v_ma20, //(k日volume均值)
turnover //換手率
python工程目前有3個文件,main.py(主程序),Stocks.py(“股票們”類)以及Hdata.py(歷史數據類)
main.py
import psycopg2 #使用的是PostgreSQL數據庫 import tushare as ts from Stocks import* from HData import* import datetime stocks=Stocks("postgres","123456") hdata=HData("postgres","123456") # stocks.db_stocks_create()#如果還沒有表則需要創建 #print(stocks.db_stocks_update())#根據todayall的情況更新stocks表 #hdata.db_hdata_date_create() nowdate=datetime.datetime.now().date() codestock_local=stocks.get_codestock_local() hdata.db_connect()#由於每次連接數據庫都要耗時0.0幾秒,故獲取歷史數據時統一連接 for i in range(0,len(codestock_local)): nowcode=codestock_local[i][0] #print(hdata.get_all_hdata_of_stock(nowcode)) print(i,nowcode,codestock_local[i][1]) maxdate=hdata.db_get_maxdate_of_stock(nowcode) print(maxdate, nowdate) if(maxdate): if(maxdate>=nowdate):#maxdate小的時候說明還有最新的數據沒放進去 continue hist_data=ts.get_hist_data(nowcode, str(maxdate+datetime.timedelta(1)),str(nowdate), 'D', 3, 0.001) hdata.insert_perstock_hdatadate(nowcode, hist_data) else:#說明從未獲取過這只股票的歷史數據 hist_data = ts.get_hist_data(nowcode, None, str(nowdate), 'D', 3, 0.001) hdata.insert_perstock_hdatadate(nowcode, hist_data) hdata.db_disconnect()
Stocks.py
import tushare as ts import psycopg2 class Stocks(object):#這個類表示"股票們"的整體(不是單元) def get_today_all(self): self.todayall=ts.get_today_all() def get_codestock_local(self):#從本地獲取所有股票代號和名稱 conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") cur = conn.cursor() # 創建stocks表 cur.execute(''' select * from stocks; ''') rows =cur.fetchall() conn.commit() conn.close() return rows pass def __init__(self,user,password): # self.aaa = aaa self.todayall=[] self.user=user self.password=password def db_perstock_insertsql(self,stock_code,cns_name):#返回的是插入語句 sql_temp="insert into stocks values(" sql_temp+="\'"+stock_code+"\'"+","+"\'"+cns_name+"\'" sql_temp +=");" return sql_temp pass def db_stocks_update(self):# 根據gettodayall的情況插入原表中沒的。。gettodayall中有的源表沒的保留不刪除#返回新增行數 ans=0 conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") cur = conn.cursor() self.get_today_all() for i in range(0,len(self.todayall)): sql_temp='''select * from stocks where stock_code=''' sql_temp+="\'"+self.todayall["code"][i]+"\';" cur.execute(sql_temp) rows=cur.fetchall() if(len(rows)==0): #如果股票代碼沒找到就插 ans+=1 cur.execute(self.db_perstock_insertsql(self.todayall["code"][i],self.todayall["name"][i])) pass conn.commit() conn.close() print("db_stocks_update finish") return ans def db_stocks_create(self): conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") cur = conn.cursor() # 創建stocks表 cur.execute(''' drop table if exists stocks; create table stocks(stock_code varchar primary key,cns_name varchar); ''') conn.commit() conn.close() print("db_stocks_create finish") pass
HData.py
import psycopg2 import tushare as ts import pandas as pd from time import clock class HData(object): def __init__(self,user,password): # self.aaa = aaa self.hdata_date=[] self.user=user self.password=password self.conn=None self.cur=None def db_connect(self): self.conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") self.cur = self.conn.cursor() def db_disconnect(self): self.conn.close() def db_hdata_date_create(self): conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") cur = conn.cursor() # 創建stocks表 cur.execute(''' drop table if exists hdata_date; create table hdata_date(stock_code varchar,record_date date, open float,high float,close float,low float, volume float, price_change float,p_change float, ma5 float,ma10 float,ma20 float, v_ma5 float,v_ma10 float,v_ma20 float, turnover float ); alter table hdata_date add primary key(stock_code,record_date); ''') conn.commit() conn.close() print("db_hdata_date_create finish") pass def db_get_maxdate_of_stock(self,stock_code):#獲取某支股票的最晚日期 self.cur.execute("select max(record_date) from hdata_date where stock_code="+"\'"+stock_code+"\'"+";") ans=self.cur.fetchall() if(len(ans)==0): return None return ans[0][0] self.conn.commit() pass def insert_perstock_hdatadate(self,stock_code,data):#插入一支股票的所有歷史數據到數據庫#如果有code和index相同的不重復插入 t1=clock() for i in range(0,len(data)): str_temp="" str_temp+="\'"+stock_code+"\'"+"," str_temp+="\'"+data.index[i]+"\'" for j in range(0,data.shape[1]): str_temp+=","+"\'"+str(data.iloc[i,j])+"\'" sql_temp="values"+"("+str_temp+")" self.cur.execute("insert into hdata_date "+sql_temp+";") self.conn.commit() print(clock()-t1) print(stock_code+" insert_perstock_hdatadate finish") def get_all_hdata_of_stock(self,stock_code):#將數據庫中的數據讀取並轉為dataframe格式返回 conn = psycopg2.connect(database="wzj_quant", user=self.user, password=self.password, host="127.0.0.1", port="5432") cur = conn.cursor() sql_temp="select * from hdata_date where stock_code="+"\'"+stock_code+"\';" cur.execute(sql_temp) rows = cur.fetchall() conn.commit() conn.close() dataframe_cols=[tuple[0] for tuple in cur.description]#列名和數據庫列一致 df = pd.DataFrame(rows, columns=dataframe_cols) return df pass
main.py的控制台輸出示例:
HData中的函數get_all_hdata_of_stock結果示例:
stock_code record_date open high close low volume \
0 603999 2015-12-10 14.07 14.07 14.07 14.07 337.00
1 603999 2015-12-11 15.48 15.48 15.48 15.48 119.00
2 603999 2015-12-14 17.03 17.03 17.03 17.03 267.00
3 603999 2015-12-15 18.73 18.73 18.73 18.73 244.00
.. ... ... ... ... ... ... ...
397 603999 2017-08-01 9.62 9.97 9.79 9.61 36337.80
398 603999 2017-08-02 9.80 9.85 9.61 9.59 32135.60
price_change p_change ma5 ma10 ma20 v_ma5 v_ma10 \
0 4.30 44.01 14.070 14.070 14.070 337.00 337.00
1 1.41 10.02 14.775 14.775 14.775 228.00 228.00
2 1.55 10.01 15.527 15.527 15.527 241.00 241.00
3 1.70 9.98 16.328 16.328 16.328 241.75 241.75
.. ... ... ... ... ... ... ...
397 0.16 1.66 9.680 9.709 9.924 36754.46 49436.88
398 -0.18 -1.84 9.698 9.741 9.863 36513.38 49998.51
v_ma20 turnover
0 337.00 0.06
1 228.00 0.02
2 241.00 0.04
3 241.75 0.04
.. ... ...
397 42602.09 1.58
398 42114.31 1.39
數據庫中的數據示例
stocks表
hdata_date表