Python爬蟲實踐 —— urllib.request和requests

本文轉載自查看原文 2019-12-16 19:12 2199

之前的兩個demo使用的是urllib內的request模塊，其中我們不免發現，返回體要獲取有效信息，請求體拼接都需要decode或encode后再裝載，http請求的話需要先構造get或post請求再調用，proxy和header等請求頭需要先構造。而requests庫幫我們進一步封裝了request模塊，我們只需要直接調用對應的request method方法，就可以方便地構造http請求。但是遇到模擬登陸等情況下時，使用urllib自定義定制http請求也是必不可少的。

比較一下urlib.request 和 requests的不同

首先是urllib.request

# demo_urllib
from urllib import request

headers = {
"User-Agent": "Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36"
}
wd = {"wd": "中國"}
url = "http://www.baidu.com/s?"
req = request.Request(url, headers=headers)
response = request.urlopen(req)
print(type(response))
print(response)
res = response.read().decode()
print(type(res))
print(res)

結果：

urllib庫的response對象是先創建httprequest對象，裝載到reques.urlopen里完成http請求，返回的是httpresponse對象，實際上是html屬性，使用.read().decode()解碼后轉化成了str字符串類型，也可以看到decode解碼后中文字符能夠顯示出來

接着是reuqests

# demo_requests
import requests

headers = {
"User-Agent": "Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36"
}
wd = {"wd": "中國"}
url = "http://www.baidu.com/s?"
response = requests.get(url, params=wd, headers=headers)
data = response.text
data2 = response.content
print(response)
print(type(response))
print(data)
print(type(data))
print(data2)
print(type(data2))
print(data2.decode())
print(type(data2.decode()))

結果：

requests庫調用是requests.get方法傳入url和參數，返回的對象是Response對象，打印出來是顯示響應狀態碼，通過.text 方法可以返回是unicode 型的數據，一般是在網頁的header中定義的編碼形式，而content返回的是bytes，二級制型的數據，還有 .json方法也可以返回json字符串。如果想要提取文本就用text，但是如果你想要提取圖片、文件等二進制文件，就要用content，當然decode之后，中文字符也會正常顯示啦 >_<

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 requests與urllib.request python requests模塊和 urllib.request模塊 python3爬蟲初探（一）之urllib.request urllib.request 與requests的區別 python中urllib.request和requests的使用和區別 Python-爬蟲03：urllib.request模塊的使用 python urllib.request模塊 Python3 內置http.client,urllib.request及三方庫requests發送請求對比 Python做簡單爬蟲（urllib.request怎么抓取https以及偽裝瀏覽器訪問的方法）爬蟲小探-Python3 urllib.request獲取頁面數據