Python爬蟲:基本操作(發送get、post請求,模擬瀏覽器,加入cookie信息)


向指定url發送get請求:

# -*- coding: utf-8 -*-
import urllib2
url = "http://localhost:80/webtest/test?name=xuejianbest"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
page_html = response.read()
print page_html

urlopen方法數據參數不為空,則發送post請求:

# -*- coding: utf-8 -*-
import urllib2
import urllib
url = "http://localhost:80/webtest/test?name=xuejianbest"
req = urllib2.Request(url)
values = {}
values["age"] = "23"
values["sex"] = "男"
data = urllib.urlencode(values)
print data   # age=23&sex=%E7%94%B7
response = urllib2.urlopen(req, data)
page_html = response.read()
print page_html

此時后台若獲取sex參數值亂碼,可以進行如下轉換(java):

System.out.println(new String(req.getParameter("sex").getBytes("iso8859-1"), "UTF-8"));

可以在請求頭中加入瀏覽器標識,模擬瀏覽器訪問:

# -*- coding: utf-8 -*-
import urllib2
user_agent = r'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2669.400 QQBrowser/9.6.10990.400'
headers = {r'User-Agent': user_agent}
url = "http://localhost:80/webtest/test"
req = urllib2.Request(url, headers = headers)
response = urllib2.urlopen(req)
page_html = response.read()
print page_html

若想讓多次請求共有一個session,可在請求頭加入cookies信息:

# -*- coding: utf-8 -*-
import urllib2
user_agent = r'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2669.400 QQBrowser/9.6.10990.400'
headers = {r'User-Agent': user_agent}
url = "http://localhost:80/webtest/test"
req = urllib2.Request(url, headers = headers)
response = urllib2.urlopen(req)
cookie = response.headers.get('Set-Cookie')    # 從第一次的請求返回中獲取cookie
print cookie        # str類型,值為: JSESSIONID=B66F6A96B2FBC7D9A7591293E28DEEE3; Path=/webtest/; HttpOnly 
page_html = response.read()
print page_html

req.add_header('cookie', cookie)    # 將cookie加入以后的請求頭,保證多次請求屬於一個session
response = urllib2.urlopen(req)
page_html = response.read()
print page_html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM