scrapy.FormRequest與FormRequest.from_response 的區別

本文轉載自查看原文 2020-04-17 11:07 750 爬蟲

本文檔參考了github,還有自己的總結測試,並且參考了https://blog.csdn.net/qq_43546676/article/details/89043445，

一、scrapy.FormRequest：適用於以下三種情況

（1）不需要post或登錄，用get方法爬去內容時候，直接用它

（2）登錄，但沒有登錄的form(沒有輸入用戶和口令的界面)

（3）需要post，單沒有form,而是用Ajax提交post

二、FormRequest.from_response

適用於以下情況

（1）提交一個form, 有界面輸入框，用來post 數據

（2）官方特別推薦的一種場景，登錄界面，登錄畫面進入(get)和提交賬號口令(post) 是同一個url的情況。

    <form action="/login" method="post" accept-charset="utf-8" >
        <input type="hidden" name="csrf_token" value="zrfpdvAFSoVQGYHsLRtBgXKZuDENhbqwOkCmMnTeIWJUlxaijycP"/>
        <div class="row">
            <div class="form-group col-xs-3">
                <label for="username">Username</label>
                <input type="text" class="form-control" id="username" name="username" />
            </div>
        </div>
        <div class="row">
            <div class="form-group col-xs-3">
                <label for="username">Password</label>
                <input type="password" class="form-control" id="password" name="password" />
            </div>
        </div>
        <input type="submit" value="Login" class="btn btn-primary" />        
    </form>

如上面例子，一些登錄界面，除了肉眼可看到的輸入用戶名，密碼，系統還隱藏着其他內容，作為csrf防攻擊策略。作為爬蟲，模擬登錄時候要和input hidden 數據一起提交。

（1）如果用scrapy.FormRequest，則需要提前爬取csrf_token的值，然后，csrf_token+用戶+口令一起提交。比較麻煩

（2）FormRequest.from_response，則可以無視csrf_token，from_response會自動取得csrf_token，並且和用戶口令提起提交。

看官網的解釋：

https://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-userlogin

It is usual for web sites to provide pre-populated form fields through <input type="hidden"> elements, 
such as session related data or authentication tokens (for login pages). When scraping,
 you’ll want these fields to be automatically pre-populated and only override a couple of them, such as the user name and password.
 You can use the FormRequest.from_response() method for this job

簡單翻譯，使用FormRequest.from_response()會讓hidden項目自動賦值，你只需要填充用戶名和密碼，就可以提交。
看看以下的2種提交方式，

import scrapy
from scrapy.http import FormRequest
class LoginSpider(scrapy.Spider):
    name = 'login2'
    allowed_domains = ['quotes.toscrape.com']
    start_urls = ['http://quotes.toscrape.com/login']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formdata={'username': 'eoin', 'password': 'eoin'},
            callback=self.parse_after_login
        )

    def parse_bak(self, response):

        token = response.xpath('//*[@name="csrf_token"]/@value').extract_first()
        yield FormRequest('http://quotes.toscrape.com/login', formdata={ 'csrf_token' : token,
         'username': 'eoin',
         'password': 'eoin'},
         callback=self.parse_after_login)

    def parse_after_login(self, response):
        print('結束！！！！')
        if response.xpath('//a[@href="/logout"]'):
            self.log(response.xpath('//a[@href="/logout"]/text()').extract_first())
            self.log("you managed to login yipee!!")
            print('登錄成功！！！！')



當然，一些網站，比如github， 他的login 進入頁面（get)和提交（post）頁面不同，這種情況下，就只能用FormRequest,因為不能自動正確識別post的地址。

github        
self.login_url = 'https://github.com/login'
self.post_url = 'https://github.com/session'

總的來說，

(1)FormRequest.from_response比較簡單，也可以進行設置 formdata，用來填寫並提交表單，實現模擬登入。相當於自動識別post

(2)scrapy.FormRequest的功能更加強大，如果FormRequest.from_response 不能解決就用scrapy.FormRequest來解決模擬登入，畢竟是手動設置post目標網址，比自動識別要精准

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 FormRequest和FormRequest.from_response的區別 scrapy基礎知識之使用FormRequest.from_response()方法模擬用戶登錄： scrapy 通過FormRequest模擬登錄再繼續 python之scrapy的FormRequest模擬POST表單自動登陸【scrapy】FormRequest Laravel FormRequest 閉包驗證獲取其他字段值 scrapy中response.body 與 response.text區別 Scrapy中scrapy.Request和response.follow的區別 Laravel 5.5 FormRequest 自定義錯誤消息 postman調試時X-Requested-With設為XMLHttpRequest Laravel5.5中使用FormRequest自定義表單驗證