網頁network發現接口返回的是json數據,怎樣通過python,通過分頁方式爬取下載到excel里或者數據庫里面
接口參數意義:
https://stock.xueqiu.com/v5/stock/chart/kline.json?symbol=SZ159915&begin=1589340438277&period=day&type=before&count=-142&indicator=kline,pe,pb,ps,pcf,market_capital,agt,ggt,balance
參數 意義
begin 起始日
period K線單位選擇,日k,月k等
type 不知道什么意義
count 數據個數
indicator 其他指標參數
接口含義:從begin那天開始,向前記錄count個交易日,並且得到indicator的指標。
圖中一些變量的意義
變量 意義
timestamp 時間戳(以ms計)。
volume 成交量
open 開盤價
high 最高價
low 收盤價
close 收盤價
其他的一些參數自己可以對比K線查看。
在Preview頁面可以更簡單查看到:
使用接口
寫代碼的時候需要用到Request Hearders項下面的Cookie和User-Agent項
接下來可以寫代碼爬取了,代碼直接貼上了,使用requests庫。
import requests import json import pandas as pd import time number = 2000 # 需要獲取的交易日的個數 begin = int(time.time() * 1000) url = 'https://stock.xueqiu.com/v5/stock/chart/kline.json?symbol=SZ159915&begin=' + str( begin) + '&period=day&type=before&count=-' + str(number) # Cookie參數根據每個人的設備來變動 headers = {'User-Agent': 'Mozilla/5.0', 'Cookie': 'xq_a_token=48575b79f8efa6d34166cc7bdc5abb09fd83ce63; xqat=48575b79f8efa6d34166cc7bdc5abb09fd83ce63; xq_r_token=7dcc6339975b01fbc2c14240ce55a3a20bdb7873; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTU4OTY4MjczMCwiY3RtIjoxNTg4OTE5Njc4OTk4LCJjaWQiOiJkOWQwbjRBWnVwIn0.l6yOJc-qTWMNU8g6wXjew0X7TmWbi82cuGiYkVvWGnUoxYSGWIx3DtfIki0etjSbN8mG0r1Gwd_q-PGo6EHL4h-SreHzt7tnteLtmnFrJ5hdyNh1g_x2u4XMvTX-pIEZmVInhBIM_BGVFerYXHuIJ6lm1G-EPR4RlVG2PQ7PTvvsz9-VycQJVZuF1zguF936WiSbPTBmhG0wcXUdfziFC1RPrXgFNTrwNXqaIiWfT5WbRWckm8aFNM3krCGCaES494Jco0FBM3eB5GJlGeB5xS1if_de7T6__PSTCmzMHokG133gRqt4FvYHu9kIQg74CdGw8u7EDWSigw-kASVAzg; u=851588919733219; is_overseas=0; Hm_lvt_1db88642e346389874251b5a1eded6e3=1588919732; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1588919732; device_id=fa23c8c5b1bd5f49c8c9ac7a657ccec3'} r = requests.get(url, headers=headers) # 爬取數據 text = r.text # 獲得文本 data = json.loads(text) # str轉成json item = data['data']['item'] # 從全部數據中取出item項 df = pd.DataFrame(item, columns=["timestamp", "volume", "open", "high", "low", "close", "chg", "percent", "turnoverrate", "amount", "volume_post", "amount_post"]) # list轉為DataFrame數據格式,更方便以后的處理 print(df)
輸出的數據如下:
timestamp volume open ... amount volume_post amount_post 0 1329408000000 67987778 0.726 ... NaN None None 1 1329667200000 39183956 0.725 ... NaN None None 2 1329753600000 77306937 0.721 ... NaN None None 3 1329840000000 193157652 0.738 ... NaN None None 4 1329926400000 124234294 0.765 ... NaN None None ... ... ... ... ... ... ... ... 1995 1588089600000 356095691 1.943 ... 696005741.0 None None 1996 1588176000000 411736129 1.964 ... 817442890.0 None None 1997 1588694400000 367767579 1.980 ... 737917205.0 None None 1998 1588780800000 265935124 2.030 ... 538456242.0 None None 1999 1588867200000 304340396 2.035 ... 622569015.0 None None [2000 rows x 12 columns]
至此,爬取工作完成,后面如何使用根據個人需求而定。
https://blog.csdn.net/qq_34769201/article/details/106072280?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link