mechanize是對urllib2的部分功能的替換,能夠更好的模擬瀏覽器行為,在web訪問控制方面做得更全面
mechanize的特點:
1 http,https協議等
2 簡單的HTML表單填寫
3 瀏覽器歷史記錄和重載
4 Referer的HTTP頭的正確添加
5 自動遵守robots.txt的
6 自動處理HTTP-EQUIV和刷新
常用函數
.CookieJar():設置cookie
.Browser():打開瀏覽器
.addheaders():User-Agent,用來欺騙服務器的
.open():打開網頁,按照官網描述可以打開任意網頁,不僅限於http
.select_form():選擇表單的,選擇表單的ID的時候需要注意。
.form[]:填寫信息
.submit():提交
1.安裝:
pip install mechanize
注:
只能在python 2.x 上
2.簡單使用
import mechanize br = mechanize.Browser() br.open("http://www.cnblogs.com/baby123/p/8078508.html") print br.title()
import mechanize request2 = mechanize.Request("https://news.cnblogs.com/") response2 = mechanize.urlopen(request2) print response2.geturl() print response2.info()
注:
response2.info() # headers
response2.read() # body
3.使用百度查詢
# coding=UTF-8 import mechanize br = mechanize.Browser() br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.set_debug_http(True) br.set_debug_redirects(True) br.set_debug_responses(True) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open("https://www.baidu.com/") br.select_form(nr = 0) br.form['wd'] = 'python mechanize' br.submit() brr=br.response().read() print brr
4.登陸
# coding=UTF-8 import mechanize br = mechanize.Browser() br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.set_debug_http(True) br.set_debug_redirects(True) br.set_debug_responses(True) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open("https://passport.csdn.net/account/login?service=http://www.csdn.net") br.select_form(nr = 0) br.form['username'] = 'XXXXXXX' br.form['password'] = '123456' br.submit() brr=br.response().read() with open("logininfo.txt","w") as f: f.write(brr)
將登陸后的html頁面寫入文件 logininfo.txt, 從文件內容看,登錄成功