爬蟲入門之response、xpath

本文轉載自查看原文 2020-03-12 21:56 3476

Response

r.status_code #http請求的返回狀態，200鏈接成功
r.text #返回對象的文本內容
r.content #猜測返回對象的二進制形式
r.encoding #分析返回對象的編碼方式
r.apparent_encoding #響應內容編碼方式

xpath

https://zhuanlan.zhihu.com/p/25572729學習網址

自動生成路徑

f12+選中要爬的內容部分+右鍵copy-->copy xpath

簡單爬蟲模板

import requests
from lxml import etree


def getHtmlText(url,header):
    files={}
    r=requests.get(url=url,headers=header)
    s=etree.HTML(r.text)
    for i in  range(10):
    #xpath的自動生成路徑
        files=s.xpath('//*[@id="comments"]/ul[1]/li['+str(i+1)+']/div[2]/p/span/text()')
    return files

def saveText(files):
    with open("discuss.text","w",encoding="utf-8") as f:
        for i in files:
            f.write(i)

if __name__ == '__main__':
    url="https://book.douban.com/subject/34876107/comments/"
    header={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"}
    print(getHtmlText(url,header))
    files=getHtmlText(url,header)
    saveText(files)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲入門（三）XPATH和BeautifulSoup4 python爬蟲xpath Python爬蟲之Xpath語法爬蟲系列(九) xpath的基本使用爬蟲之 BeautifulSoup與Xpath python爬蟲xpath的語法爬蟲（2）——requests以及xpath的使用 Python爬蟲 | xpath的安裝爬蟲之Xpath詳解 python爬蟲之xpath的基本使用