記錄通過chales爬取‘京東到家’小程序里某沃爾瑪線線上店的商品數據(mac系統)


1.安裝、打開chales,配置charles。

1.1勾選Proxy->macOS Proxy選項,關閉默認的mac proxy設置。

 1.2勾選Proxy->Proxy Settings,彈出彈框。設置HTTP的代理端口為:6666(一般默認為:8888,可以自己定義)

 1.3勾選Proxy->SSL Proxying Settings,添加要抓包的域名。我們可以添加:*,匹配所有的。

 

2.手機端的配置。(以iso系統為例)

2.1點擊連接的Wi-Fi的感嘆號圖標;點擊最后一項:HTTP代理->配置代理;選擇‘手動’,填入電腦的ip地址和剛剛設置chales的端口號:6666

 

 

3.https抓包的配置。

3.1因為要抓包的是https請求,所以我們還要安裝證書。勾選Help->SSL Proxying->Install Charles Root Certificate。

3.2雙擊電腦端添加的charles證書,選擇‘始終信任’。

3.3安裝手機端的證書。勾選Help->SSL Proxying->Install Charles Root Certificate on a Mobile Device or Remote Browser。根據提示在手機端訪問網址chls.pro/ssl。

3.4根據彈窗的提示,在手機端安裝該證書。

  

3.5在‘通用->關於本機->證書信任設置’里選擇完全信任該證書。(證書就是一套公鑰私鑰,所以手機和電腦端都要安裝,並選擇信任)

 

4.1點擊圓形按鈕,就可以追蹤手機開始抓包了。

本文例子中是選擇了一家沃爾瑪超市,進入該店鋪進行數據抓取。

 

4.2通過分析發現發現獲取商品類目的url拼接規律:

url1 = 'https://daojia.jd.com/client?lat=22.56705&lng=113.95371&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=station%2FgetStationDetail&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22skuId%22%3A%22%22%2C%22orgCode%22%3A%2281372%22%2C%22activityId%22%3A%22%22%2C%22promotionType%22%3A%22%22%2C%22lgt%22%3A113.95371%2C%22lat%22%3A22.56705%7D&afsImg=&business='

body里的內容,解碼后為:

body = {"storeId":"11653731","skuId":"","orgCode":"81372","activityId":"","promotionType":"","lgt":113.95371,"lat":22.56705}

body里的數值不影響獲取類目的獲取。所以通過url1發送get方法就可以獲取數據。

import requests

url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644375%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined'
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
headers = {'User-Agent': ua}
res = requests.get(url, headers=headers)
print(res.text)  # 即為返回的數據內容

部分數據展示:

4.3通過分析發現獲取不同類目下商品的url拼接規律:

url2 = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644376%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined'

body里的內容,解碼后為:

body = {"storeId":"11653731","orgCode":"81372","skuId":"","catIds":[{"catId":"4644376","type":2}]}

catId值可以從url1返回的數據提取,傳入不同的catId值,就會返回對應該類目下商品的信息。

import requests
import time
from urllib.parse import quote

def get_product(cateid2):  # 傳入二級類目的類目id值
    body = {
        "storeId": "11653731",
        "orgCode": "81372",
        "skuId": "",
        "catIds": [{"catId": cateid2, "type": 2}]}
    body = json.dumps(body)
    body = quote(body)
    base_url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body={}&afsImg=&business=undefined'.format(body)
    print(base_url)  # 根據不同的cateId拼接url
    ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
    headers = {'User-Agent': ua}
    res = requests.get(base_url, headers=headers)
    print(res.text)

 部分數據展示:

 4.4將數據整理好輸出為表的格式:

filename = '{}.csv'.format(catename1)
csvfile = open(filename, 'a')
writer = csv.writer(csvfile)
writer.writerow(['商品名稱', '價格(單位:元)', '月銷量', '圖片', '二級類目', '一級類目'])

for product in searchResultVOList:
    print(product)
    name = product['skuName']
    img = product['imgUrl']
    price = product['realTimePrice']
    sale = product['monthSales']
    writer.writerow([name, price, sale, img, catename2, catename1])

csvfile.close()

部分數據展示:

 

4.5完整代碼見:https://github.com/HongDanni/jd_daojia 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM