教你用python爬取『京東』商品數據，原來這么簡單！

本文轉載自查看原文 2021-03-10 13:06 2773

本文編程過程已錄成視頻講解，歡迎掃碼學習！

本文手撕代碼過程

前言

本文將從小白的角度入手，一步一步教大家如何爬取『京東』商品數據，文中以【筆記本】電腦為例！

干貨內容包括：

如何爬取商品信息？

如何爬取下一頁？

如何將爬取出來的內容保存到excel？

分析網頁結構

1.查看網頁

在『京東商城』搜索框輸入：筆記本

鏈接如下：

https://search.jd.com/search?keyword=筆記本&wq=筆記本&ev=exbrand_聯想%5E&page=9&s=241&click=1

在瀏覽器里面按F12，分析網頁標簽（這里我們需要爬取1.商品名稱、2.商品價格、3.商品評論數）

2.分析網頁標簽

獲取當前網頁所有商品

可以看到在class標簽id=J_goodsList里ul->li,對應着所有商品列表

獲取商品具體屬性

每一個li（商品）標簽中，class=p-name p-name-type-2對應商品標題，class=p-price對應商品價格，class=p-commit對應商品ID（方便后面獲取評論數）

避坑：

這里商品評論數不能直接在網頁上獲取！！！，需要根據商品ID去獲取。

爬取數據

1.編程實現

url="https://search.jd.com/search?keyword=筆記本&wq=筆記本&ev=exbrand_聯想%5E&page=9&s=241&click=1"
res = requests.get(url,headers=headers)
res.encoding = 'utf-8'
text = res.text




selector = etree.HTML(text)
list = selector.xpath('//*[@id="J_goodsList"]/ul/li')


for i in list:
    title=i.xpath('.//div[@class="p-name p-name-type-2"]/a/em/text()')[0]
    price = i.xpath('.//div[@class="p-price"]/strong/i/text()')[0]
    product_id = i.xpath('.//div[@class="p-commit"]/strong/a/@id')[0].replace("J_comment_","")
    print("title"+str(title))
    print("price="+str(price))
    print("product_id="+str(product_id))
    print("-----")

下面教大家如何獲取商品評論數！

2.獲取商品評論數

查看network，找到如下數據包

將該url鏈接放到瀏覽器里面可以獲取到商品評論數

分析url

根據商品ID（可以同時多個ID一起獲取）獲取商品評論數

‍

最后我們可以將獲取商品評論數的方法封裝成一個函數

###根據商品id獲取評論數
def commentcount(product_id):
    url = "https://club.jd.com/comment/productCommentSummaries.action?referenceIds="+str(product_id)+"&callback=jQuery8827474&_=1615298058081"
    res = requests.get(url, headers=headers)
    res.encoding = 'gbk'
    text = (res.text).replace("jQuery8827474(","").replace(");","")
    text = json.loads(text)
    comment_count = text['CommentsCount'][0]['CommentCountStr']


    comment_count = comment_count.replace("+", "")
    ###對“萬”進行操作
    if "萬" in comment_count:
        comment_count = comment_count.replace("萬","")
        comment_count = str(int(comment_count)*10000)




    return comment_count

此外，我們可以發現在獲取到的評論數包含“萬”“+”等符號，需要進行相應處理！

for i in list:
    title=i.xpath('.//div[@class="p-name p-name-type-2"]/a/em/text()')[0]
    price = i.xpath('.//div[@class="p-price"]/strong/i/text()')[0]
    product_id = i.xpath('.//div[@class="p-commit"]/strong/a/@id')[0].replace("J_comment_","")
    
    
    ###獲取商品評論數
    comment_count = commentcount(product_id)
    print("title"+str(title))
    print("price="+str(price))
    print("product_id="+str(comment_count))

保存到excel

1.定義表頭

import openpyxl
outwb = openpyxl.Workbook()
outws = outwb.create_sheet(index=0)


outws.cell(row=1,column=1,value="index")
outws.cell(row=1,column=2,value="title")
outws.cell(row=1,column=3,value="price")
outws.cell(row=1,column=4,value="CommentCount")

引入openpyxl庫將數據保存到excel，表頭內容包含（1.序號index、2.商品名稱title、3.商品價格price、4.評論數CommentCount）

2.開始寫入

    for i in list:
        title=i.xpath('.//div[@class="p-name p-name-type-2"]/a/em/text()')[0]
        price = i.xpath('.//div[@class="p-price"]/strong/i/text()')[0]
        product_id = i.xpath('.//div[@class="p-commit"]/strong/a/@id')[0].replace("J_comment_","")




        ###獲取商品評論數
        comment_count = commentcount(product_id)
        print("title"+str(title))
        print("price="+str(price))
        print("comment_count="+str(comment_count))


        outws.cell(row=count, column=1, value=str(count-1))
        outws.cell(row=count, column=2, value=str(title))
        outws.cell(row=count, column=3, value=str(price))
        outws.cell(row=count, column=4, value=str(comment_count))
        
outwb.save("京東商品-李運辰.xls")#保存

最后保存成京東商品-李運辰.xls

下一頁分析

很重要！很重要！很重要！

1.分析下一頁

這里的下一頁與平常看到的不一樣，有點特殊！

可以發現page和s有一下規律

page以2遞增，s以60遞增。

2.構造下一頁鏈接

遍歷每一頁
def getpage():
    page=1
    s = 1
    for i in range(1,6):
        print("page="+str(page)+",s="+str(s))
        url = "https://search.jd.com/search?keyword=筆記本&wq=筆記本&ev=exbrand_聯想%5E&page="+str(page)+"&s="+str(s)+"&click=1"
        getlist(url)
        page = page+2
        s = s+60

這樣就可以爬取下一頁。

總結

1.入門爬蟲（京東商品數據為例）。

2.如何獲取網頁標簽。

3.獲取『京東』商品評論數

4.如何通過python將數據保存到excel

5.分析構造『京東』商品網頁下一頁鏈接

如果大家對本文代碼源碼感興趣，掃碼關注『Python爬蟲數據分析挖掘』后台回復：京東商品 ，獲取完整代碼！

本文編程過程已錄成視頻講解，歡迎掃碼學習！

本文手撕代碼過程

如果大家想加群學習，后台點擊：加群交流

------------- 推薦文章 -------------

爬蟲入門篇

1.今天只分享python、爬蟲入門級學習資料

爬蟲框架篇

1.以『B站』為實戰案例！手把手教你掌握爬蟲必備框架『Scrapy』

2.爬取1907條『課程學習』數據，分析哪類學習資源最受大學生青睞1

爬蟲反爬篇

1.爬蟲遇到反爬機制怎么辦? 看看我是如何解決的！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬取京東商城的商品數據 Python3爬蟲爬取淘寶商品數據最新！Python爬蟲項目案例講解一步步教你爬取淘寶商品數據 Scrapy實戰篇（八）之Scrapy對接selenium爬取京東商城商品數據 java爬蟲練習|爬取京東上的手機商品數據爬蟲(十七)：Scrapy框架(四) 對接selenium爬取京東商品數據記錄通過chales爬取‘京東到家’小程序里某沃爾瑪線線上店的商品數據（mac系統） Python requests 爬取淘寶商品數據，並連接數據庫，保存數據 python爬蟲-京東商品爬取 Python爬取京東商品用戶的評價