1.安裝、打開chales,配置charles。
1.1勾選Proxy->macOS Proxy選項,關閉默認的mac proxy設置。
1.2勾選Proxy->Proxy Settings,彈出彈框。設置HTTP的代理端口為:6666(一般默認為:8888,可以自己定義)
1.3勾選Proxy->SSL Proxying Settings,添加要抓包的域名。我們可以添加:*,匹配所有的。
2.手機端的配置。(以iso系統為例)
2.1點擊連接的Wi-Fi的感嘆號圖標;點擊最后一項:HTTP代理->配置代理;選擇‘手動’,填入電腦的ip地址和剛剛設置chales的端口號:6666
3.https抓包的配置。
3.1因為要抓包的是https請求,所以我們還要安裝證書。勾選Help->SSL Proxying->Install Charles Root Certificate。
3.2雙擊電腦端添加的charles證書,選擇‘始終信任’。
3.3安裝手機端的證書。勾選Help->SSL Proxying->Install Charles Root Certificate on a Mobile Device or Remote Browser。根據提示在手機端訪問網址chls.pro/ssl。
3.4根據彈窗的提示,在手機端安裝該證書。
3.5在‘通用->關於本機->證書信任設置’里選擇完全信任該證書。(證書就是一套公鑰私鑰,所以手機和電腦端都要安裝,並選擇信任)
4.1點擊圓形按鈕,就可以追蹤手機開始抓包了。
本文例子中是選擇了一家沃爾瑪超市,進入該店鋪進行數據抓取。
4.2通過分析發現發現獲取商品類目的url拼接規律:
url1 = 'https://daojia.jd.com/client?lat=22.56705&lng=113.95371&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=station%2FgetStationDetail&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22skuId%22%3A%22%22%2C%22orgCode%22%3A%2281372%22%2C%22activityId%22%3A%22%22%2C%22promotionType%22%3A%22%22%2C%22lgt%22%3A113.95371%2C%22lat%22%3A22.56705%7D&afsImg=&business='
body里的內容,解碼后為:
body = {"storeId":"11653731","skuId":"","orgCode":"81372","activityId":"","promotionType":"","lgt":113.95371,"lat":22.56705}
body里的數值不影響獲取類目的獲取。所以通過url1發送get方法就可以獲取數據。
import requests url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644375%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined' ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' headers = {'User-Agent': ua} res = requests.get(url, headers=headers) print(res.text) # 即為返回的數據內容
部分數據展示:
4.3通過分析發現獲取不同類目下商品的url拼接規律:
url2 = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644376%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined'
body里的內容,解碼后為:
body = {"storeId":"11653731","orgCode":"81372","skuId":"","catIds":[{"catId":"4644376","type":2}]}
catId值可以從url1返回的數據提取,傳入不同的catId值,就會返回對應該類目下商品的信息。
import requests import time from urllib.parse import quote def get_product(cateid2): # 傳入二級類目的類目id值 body = { "storeId": "11653731", "orgCode": "81372", "skuId": "", "catIds": [{"catId": cateid2, "type": 2}]} body = json.dumps(body) body = quote(body) base_url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body={}&afsImg=&business=undefined'.format(body) print(base_url) # 根據不同的cateId拼接url ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' headers = {'User-Agent': ua} res = requests.get(base_url, headers=headers) print(res.text)
部分數據展示:
4.4將數據整理好輸出為表的格式:
filename = '{}.csv'.format(catename1) csvfile = open(filename, 'a') writer = csv.writer(csvfile) writer.writerow(['商品名稱', '價格(單位:元)', '月銷量', '圖片', '二級類目', '一級類目']) for product in searchResultVOList: print(product) name = product['skuName'] img = product['imgUrl'] price = product['realTimePrice'] sale = product['monthSales'] writer.writerow([name, price, sale, img, catename2, catename1]) csvfile.close()
部分數據展示: