爬蟲偽裝頭部

本文轉載自查看原文 2020-01-17 14:09 750

偽裝頭部是最基本的反反爬蟲方法，下面假設我們有一個網站：

from flask import Flask

app = Flask(__name__)


@app.route('/getInfo')
def hello_world():
    return "這里假裝有很多數據"

@app.route('/')
def index():
    return "個人主頁"

if __name__ == "__main__":
    app.run(debug=True)

現在就可以通過http://127.0.0.1:5000/ 訪問了。

我們想看看請求的 header 信息

from flask import request  #記得引入request

@app.route('/getInfo')
def hello_world():
    print(request.headers)
    return "這里假裝有很多數據"

結果看到的 headers 信息是這樣的

Host: 127.0.0.1:5000
User-Agent: python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

“User-Agent: python-requests/2.21.0”，居然使用 python 的庫來請求，於是服務端判斷一下就把你封了。

@app.route('/getInfo')
def hello_world():
    if(str(request.headers.get('User-Agent')).find('python') >= 0):
        return "小子，使用爬蟲是吧？"
    else:
        return "這里假裝有很多數據"

怎么辦呢？現在的你學會假裝自己是瀏覽器，

import requests

if __name__ == '__main__':
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
    }
    url = 'http://127.0.0.1:5000/getInfo'
    response = requests.get(url, headers=headers)
    print(response.text)

這樣又能開心的獲取數據了。

當然，你還可以搞個用戶代理列表，每次從中隨機選取。

參考鏈接：https://zhuanlan.zhihu.com/p/59745385

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲17 | 聽說你又被封 ip 了，你要學會偽裝好自己，這次說說偽裝你的頭部 Python 爬蟲（一）：爬蟲偽裝關於urllib、urllib2爬蟲偽裝的總結 python爬蟲之偽裝瀏覽器 python網絡爬蟲 - 如何偽裝逃過反爬蟲程序反爬蟲機制----偽裝User-Agent之fake-useragent python3爬蟲.2.偽裝瀏覽器 fake-useragent，python爬蟲偽裝請求頭 python 3.4 爬蟲，偽裝瀏覽器（403 Forbidden） Python_爬蟲偽裝_ scrapy中fake_userAgent的使用