Scrapy默認的是get請求,想要發送post請求,就需要再method中說明,一般常用寫法如下
scrapy.Request(url=url,method="POST",headers=self.headers,callback=self.get_goods_list)
但post請求通常會帶有表單參數,對於表單參數的注入,引出了兩種方式,這里說明一下。
一、FormRequest
普通請求使用scrapy.Request類就可以實現,但是遇到模擬表單或Ajax提交post請求的時候,就可以使用Request 子類 FormRequest類,因為他自帶 formdata ,專門用來設置表單字段數據,默認method也是POST。
scrapy.FormRequest(url=url,formdata=formdata,cookies=self.cookie,headers=self.headers,callback=self.get_goods_list)
但要注意的是,這里的formdata是dict格式的,里面不能存在數字,如果有數字用引號括起來;
如下:

formdata = {"mode": "list", "year: ": "default","prev":"false","side_year":""}
yield FormRequest(url=new_url, formdata = formdata, callback=self.parse_category, meta=meta)
在FormRequest的說明文檔中介紹:
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
說FormRequest新增加了一個參數formdata,接受包含表單數據的字典或者可迭代的元組,並將其轉化為請求的body。並且FormRequest是繼承Request的。
class FormRequest(Request): def __init__(self, *args, **kwargs): formdata = kwargs.pop('formdata', None) if formdata and kwargs.get('method') is None: kwargs['method'] = 'POST' super(FormRequest, self).__init__(*args, **kwargs) if formdata: items = formdata.items() if isinstance(formdata, dict) else formdata querystr = _urlencode(items, self.encoding) if self.method == 'POST': self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded') self._set_body(querystr) else: self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr) ### def _urlencode(seq, enc): values = [(to_bytes(k, enc), to_bytes(v, enc)) for k, vs in seq for v in (vs if is_listlike(vs) else [vs])] return urlencode(values, doseq=1)
最終我們傳遞的{'key': 'value', 'k': 'v'}會被轉化為'key=value&k=v' 並且默認的method是POST,這其實也暗示如果表單參數很少的時候,直接拼接到url上會更方便一些。
再來看看Request
二、Request
scrapy.Request(url=url,method="POST",body=formdata,cookies=self.cookie,headers=self.headers,callback=self.get_goods_list)
但這里的formdata必須得是序列化的json字符串,如果是表單格式,那么需要用json.dumps()轉為字符串格式
formdata = {"mode": "list", "year: ": "default","prev":"false","side_year":""}
yield Request(url=new_url, body = json.dumps(formdata), callback=self.parse_category, meta=meta)
