scrapy.FormRequest
- 通過FormRequest函數實現向服務器發送post請求,請求參數需要配合網站設計發送特殊參數。
1 class FormrequestSpider(CrawlSpider): 2 name='github' 3 allowed_domains=['github.com'] 4 start_urls=['https://github.com/login'] 5 6 def parse(self, response): 7 authenticity_token=response.xpath("//input[@name='authenticity_token']/@value").extract_first() 8 utf8=response.xpath("//input[@name='utf8']/@value").extract_first() 9 commit=response.xpath("//input[@name='commit']/@value").extract_first() 10 post_data=dict( 11 login="***********", 12 password="**********", 13 authenticity_token=authenticity_token, 14 utf8=utf8, 15 commit=commit, 16 ) 17 # 表單請求 18 yield scrapy.FormRequest( 19 "https://github.com/session", 20 formdata=post_data, 21 callback=self.after_login 22 ) 23 24 def after_login(self, response): 25 # with open("a.html","w",encoding="utf-8") as f: 26 # f.write(response.body.decode()) 27 print(re.findall("********", response.body.decode()))
scrapy.FormRequest.from_response
-
FormRequest.from_response模擬瀏覽器點擊行為向服務器發送post請求
- 只能應用在form標簽做的表單登錄網站上
- 只關心輸入賬號和密碼,名稱按網頁設計的標簽名稱填寫。
1 class GithubSpider(CrawlSpider): 2 name='github2' 3 allowed_domains=['github.com'] 4 start_urls=['https://github.com/login'] 5 6 def parse(self, response): 7 yield scrapy.FormRequest.from_response( 8 response, # 自動的從response中尋找from表單 9 # formdata只需要傳入字典型登錄名和密碼,字典的健是input標簽中的name屬性 10 formdata={"login": "***********", "password": "**********"}, 11 callback=self.after_login 12 ) 13 14 def after_login(self, response): 15 print(response.text)
參考閱讀: