近日,挖地兔更新了tushare版本。主要是推出了新的函數get_k_data函數。來對此函數做一些分析。
函數頭部分:
def get_k_data(code=None, start='', end='', ktype='D', autype='qfq', index=False, retry_count=3, pause=0.001): """ 獲取k線數據 --------- Parameters: code:string 股票代碼 e.g. 600848 start:string 開始日期 format:YYYY-MM-DD 為空時取當前日期 end:string 結束日期 format:YYYY-MM-DD 為空時取去年今日 autype:string 復權類型,qfq-前復權 hfq-后復權 None-不復權,默認為qfq ktype:string 數據類型,D=日k線 W=周 M=月 5=5分鍾 15=15分鍾 30=30分鍾 60=60分鍾,默認為D retry_count : int, 默認 3 如遇網絡等問題重復執行的次數 pause : int, 默認 0 重復請求數據過程中暫停的秒數,防止請求間隔時間太短出現的問題 drop_factor : bool, 默認 True 是否移除復權因子,在分析過程中可能復權因子意義不大,但是如需要先儲存到數據庫之后再分析的話,有該項目會更加靈活
接下來一行行分析(用紅色表示get_k_data函數的代碼):
symbol = ct.INDEX_SYMBOL[code] if index else _code_to_symbol(code)
url = ''
dataflag = ''
index若為True直接去預先定義好的字典中找對應的symb,如果index是False,則調用函數_code_to_symbol:
def _code_to_symbol(code): """ 生成symbol代碼標志 """ if code in ct.INDEX_LABELS: return ct.INDEX_LIST[code] else: if len(code) != 6 : return '' else: return 'sh%s'%code if code[:1] in ['5', '6', '9'] else 'sz%s'%code
找到INDEX_LABELS和INDEX_LIST的定義:
INDEX_LABELS = ['sh', 'sz', 'hs300', 'sz50', 'cyb', 'zxb', 'zx300', 'zh500']
INDEX_LIST = {'sh': 'sh000001', 'sz': 'sz399001', 'hs300': 'sz399300', 'sz50': 'sh000016', 'zxb': 'sz399005', 'cyb': 'sz399006', 'zx300': 'sz399008', 'zh500':'sh000905'}
如果code是以'5','6','9'開頭,則在code前加上sh,否則在code前加上sz。
可見這個symbol的主要作用是根據code在前面加上了sh或sz。
if ktype.upper() in ct.K_LABELS: %K_LABELS = ['D', 'W', 'M'] fq = autype if autype is not None else '' %是否復權以及復權類型 if code[:1] in ('1', '5') or index: %如果code是'1','5'開頭或者index(是指數)為真 fq = '' kline = '' if autype is None else 'fq' %只有填None才是不復權 url = ct.KLINE_TT_URL%(ct.P_TYPE['http'], ct.DOMAINS['tt'], %P_TYPE = {'http': 'http://', 'ftp': 'ftp://'},DOMAINS定義見下方 kline, fq, symbol, %''或者'fq',具體復權類型或者'',加了sh或sz的code ct.TT_K_TYPE[ktype.upper()], start, end, %TT_K_TYPE = {'D': 'day', 'W': 'week', 'M': 'month'} fq, _random(17)) %具體復權類型或者'',生成一個10**16到10**17-1之間的隨機數 dataflag = '%s%s'%(fq, ct.TT_K_TYPE[ktype.upper()]) %復權類型或''並上'day'或'week'或'month'
elif ktype in ct.K_MIN_LABELS: %K_MIN_LABELS = ['5', '15', '30', '60']
url = ct.KLINE_TT_MIN_URL%(ct.P_TYPE['http'], ct.DOMAINS['tt'], %基本同上
symbol, ktype, ktype,
_random(16))
dataflag = 'm%s'%ktype %m'5'或'15'或'30'或'60'
else:
raise TypeError('ktype input error.')
DOMAINS定義:
DOMAINS = {'sina': 'sina.com.cn', 'sinahq': 'sinajs.cn', 'ifeng': 'ifeng.com', 'sf': 'finance.sina.com.cn', 'vsf': 'vip.stock.finance.sina.com.cn', 'idx': 'www.csindex.com.cn', '163': 'money.163.com', 'em': 'eastmoney.com', 'sseq': 'query.sse.com.cn', 'sse': 'www.sse.com.cn', 'szse': 'www.szse.cn', 'oss': '218.244.146.57', 'idxip':'115.29.204.48', 'shibor': 'www.shibor.org', 'mbox':'www.cbooo.cn', 'tt': 'gtimg.cn'}
上面兩個URL的定義
KLINE_TT_URL = '%sweb.ifzq.%s/appstock/app/%skline/get?_var=kline_day%s¶m=%s,%s,%s,%s,320,%s&r=0.%s' KLINE_TT_MIN_URL = '%sifzq.%s/appstock/app/kline/mkline?param=%s,m%s,,320&_var=m%s_today&r=0.%s'
for _ in range(retry_count): %retry_count是重做次數,_只是作為一個變量,就跟變量i一樣
time.sleep(pause) %中間暫停的時間
try:
request = Request(url) %使用上面求出的url
lines = urlopen(request, timeout = 10).read() %讀出數據
if len(lines) < 100: #no data %如果lines太短,表明未讀到數據
return None
except Exception as e:
print(e)
else:
lines = lines.decode('utf-8') if ct.PY3 else lines %PY3 = (sys.version_info[0] >= 3) 這個解碼出來的lines在下方
lines = lines.split('=')[1] %按'='分隔,取第一個分片。
reg = re.compile(r',{"nd.*?}')
lines = re.subn(reg, '', lines) %對lines進行正則表達式替換
js = json.loads(lines[0]) %之所以要選lines[0]是因為subn返回的是一個tuple,lines[1]部分是替換次數
df = pd.DataFrame(js['data'][symbol][dataflag], columns=ct.KLINE_TT_COLS) %KLINE_TT_COLS就是date,open,close等六列標題
df['code'] = symbol if index else code %df新加一列code,且設置為指數代碼或股票代碼
if ktype in ct.K_MIN_LABELS: %如果是分鍾k線數據
df['date'] = df['date'].map(lambda x: '%s-%s-%s %s:%s'%(x[0:4], x[4:6],
x[6:8], x[8:10],
x[10:12])) %date部分改成天-時-分-秒的格式
return df
raise IOError(ct.NETWORK_URL_ERROR_MSG)
lines:
kline_dayhfq={"code":0,"msg":"","data":{"sz002792":{"hfqday":[["2016-10-26","84.635","82.541","85.268","82.149","27380.000"],
["2016-10-27","82.707","82.556","83.038","80.748","22315.000"],["2016-10-28","82.903","82.571","83.731","78.428","22165.000"],
["2016-10-31","82.541","81.502","82.556","79.995","16437.000"],["2016-11-01","81.517","84.319","85.072","81.517","30741.000"],
["2016-11-02","84.349","82.873","85.268","82.707","30526.000"],["2016-11-03","81.200","81.984","83.611","81.200","24593.000"],
["2016-11-04","81.863","85.720","86.729","81.863","57996.000"],["2016-11-07","85.464","85.991","86.383","84.756","31572.000"],
["2016-11-08","86.292","84.801","86.322","79.845","29328.000"]],
"qt":{"sz002792":["51","\u901a\u5b87\u901a\u8baf","002792","55.91","56.29","56.25","36536","18510","18026","55.91","38","55.90","127",
"55.89","201","55.85","10","55.83","10","55.99","30","56.00","3","56.10","10","56.12","8","56.15","26",
"15:00:04\/55.91\/301\/S\/1682891\/15265|14:57:00\/55.89\/1\/B\/5589\/15163|14:56:52\/55.71\/90\/S\/503812\/15154|
14:56:45\/55.89\/18\/B\/100602\/15146|14:56:39\/55.82\/8\/S\/44544\/15140|14:56:36\/56.12\/12\/B\/67324\/15136","20161109150137",
"-0.38","-0.68","56.75","54.46","55.89\/36235\/201929177","36536","20361","8.12","56.40","","56.75","54.46","4.07","25.16",
"125.80","7.09","61.92","50.66","1.05"],"market":["2016-11-09 20:57:01|HK_close_\u5df2\u6536\u76d8|SH_close_\u5df2\u6536\u76d8|
SZ_close_\u5df2\u6536\u76d8|US_close_\u672a\u5f00\u76d8|SQ_close_\u5df2\u4f11\u5e02|DS_close_\u5df2\u4f11\u5e02|ZS_close_
\u5df2\u4f11\u5e02"],"zjlx":["sz002792","8206.89","10347.24","-2140.35","-10.51","12154.32","10013.97","2140.35","10.51",
"20361.21","41080.23","41732.96","\u901a\u5b87\u901a\u8baf","20161109","20161108^5889.20^7540.99","20161107^6888.64^7504.11",
"20161104^15471.59^10227.30","20161103^4623.91^6113.32"]},"mx_price":{"mx":{"data":[],"timeline":[]},"price":{"data":[]}},
"prec":"22.940","version":"5"}}}
這樣詳細的扣代碼就這一次吧,以后還是應該提高效率,記錄得簡略些。