python之cookie, cookiejar 模擬登錄繞過驗證


0.思路

如果懶得模擬登錄,或者模擬登錄過於復雜(多步交互或復雜驗證碼)則人工登錄后手動復制cookie(或者代碼讀取瀏覽器cookie),缺點是容易過期。

如果登錄是簡單的提交表單,代碼第一步模擬登錄,第二步通過cookiejar訪問目標url。

1.參考

python處理cookie詳解

李劼傑的博客

Python使用Cookie字符串發起HTTP請求的幾個方法(1)

Python使用Cookie字符串發起HTTP請求的幾個方法(2)

Python使用Chrome瀏覽器的Cookies發起HTTP請求

 

fuck-login/001 zhihu/zhihu.py    一系列網站登錄!

Python 爬蟲之模擬知乎登錄

try:
    import cookielib
except:
    import http.cookiejar as cookielib  #兼容python3

 

requests.session

# 使用登錄cookie信息
session = requests.session()
session.cookies = cookielib.LWPCookieJar(filename='cookies')
try:
    session.cookies.load(ignore_discard=True)
except:
print("Cookie 未能加載")


# 保存 cookies 到文件,
# 下次可以使用 cookie 直接登錄,不需要輸入賬號和密碼
session.cookies.save()

 IE/Firefox/Chrome等瀏覽器保存Cookie的位置

中大黑熊 cookielib和urllib2模塊相結合模擬網站登錄

現代魔法學院 用Python模擬登錄網站

 

python sqlite3查看數據庫所有表(table)

Python 爬蟲解決登錄問題的另類方法

  獲取瀏覽器的 Cookies, 然后讓 requests 這個庫來直接使用登錄好的 Cookies

Chrome 33+瀏覽器 Cookies encrypted_value解密腳本(python實現)

python3讀取chrome瀏覽器cookies

 

10行代碼爬取微信公眾號文章評論

打開 Chrome 瀏覽器你會看到發送請求時會自動把 Cookie 信息發送給微信,我們就把這段 Cookie 數據拷貝出來,用 Python 構建一個 Cookie 對象,給 requests 使用。

from http.cookies import SimpleCookie
raw_cookie = "gsScrollPos-5517=; ..中間還省略很多... bizuin=2393828"

cookie = SimpleCookie(raw_cookie)
requests_cookies = dict([(c, cookie[c].value) for c in cookie])

r = requests.get(url, cookies=requests_cookies)

 

同時打開 fiddler 抓取 https 報錯 urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

https://imaojia.com/blog/questions/urlerror-urlopen-error-ssl-certificate-verify-failed-certificate-verify-failed-ssl-c-590/ 

https://www.python.org/dev/peps/pep-0476/

2.最快用法 fiddler request Raw格式 + request

fiddler 全選復制,避免結尾多余空行

import requests
#fiddler request Raw ctrl+a 全選復制
#GET則lines[-1]為'', POSt則lines[-2]為'', lines[-1]為body, 表單才會同 url query
with open('headers.txt') as f:
    lines = [i.strip() for i in f.readlines()]

#fiddler request Raw 的起始行為完整URl?!
(method, url, _) = lines[0].split()

if method == 'POST':
    body = lines[-1]
    lines = lines[1:-2]
else:
    lines = lines[1:-1]
    
headers = {}
for line in lines:
    k, v = line.split(': ',1)  #:注意后面有空格
    headers[k] = v

#requests 自動處理3xx,比如xueqiu.com自動跳轉個人首頁    
if method == 'POST':
    data = dict([i.split('=', 1) for i in body.split('&')])  #這里只考慮了表單 POST,否則可以直接傳入data=string
    r = requests.post(url, headers=headers, data=data, verify=False)
else:
    r = requests.get(url, headers=headers, verify=False)

3.最古老用法 urllib2 + cookiejar

# -*- coding: utf-8 -*-
import os
import urllib, urllib2
try:
    import cookielib
except:
    import http.cookiejar as cookielib  #兼容python3

    
# https://imaojia.com/blog/questions/urlerror-urlopen-error-ssl-certificate-verify-failed-certificate-verify-failed-ssl-c-590/
# https://www.python.org/dev/peps/pep-0476/
import ssl
# 全局關閉證書驗證,不建議
try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    # Legacy Python that doesn't verify HTTPS certificates by default
    pass
else:
    # Handle target environment that doesn't support HTTPS verification
    ssl._create_default_https_context = _create_unverified_https_context
# 或者創建未經驗證的上下文
# context = ssl._create_unverified_context()
# print urllib2.urlopen("https://imaojia.com/", context=context).read()


def login_xueqiu():
    # chrome隱身 左上角安全鎖 正在使用cookie 刪除

    # fiddler request Raw:
    """
    POST https://xueqiu.com/snowman/login HTTP/1.1
    Host: xueqiu.com
    Connection: keep-alive
    Content-Length: 72
    Pragma: no-cache
    Cache-Control: no-cache
    Accept: */*
    Origin: https://xueqiu.com
    X-Requested-With: XMLHttpRequest
    User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
    Content-Type: application/x-www-form-urlencoded; charset=UTF-8
    Referer: https://xueqiu.com/
    Accept-Encoding: gzip, deflate, br
    Accept-Language: zh-CN,zh;q=0.9
    Cookie: aliyungf_tc=...

    remember_me=true&username=xxx%40139.com&password=xxx&captcha=
    """
    
    url_login = 'https://xueqiu.com/snowman/login'
    url_somebody = 'https://xueqiu.com/u/6146070786'
    
    data_dict = {
    'remember_me': 'true',  #true false
    'username': os.getenv('xueqiu_username'),
    'password': os.getenv('xueqiu_password'),
    }
    # 注意需要轉換為 URL query string
    data = urllib.urlencode(data_dict)    
    
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',  #表明是AJax異步,否則是傳統同步請求  注釋掉:urllib2.HTTPError: HTTP Error 404: Not Found
    }
    
    # urllib2.Request(self, url, data=None, headers={}, origin_req_host=None, unverifiable=False)
    req = urllib2.Request(url_login, data, headers)
    # 參考寫法 C:\Program Files\Anaconda2\Lib\urllib2.py
    cookiejar = cookielib.CookieJar()
    handler = urllib2.HTTPCookieProcessor(cookiejar)  #理論上可以不傳參數,但是后面無法使用 cookiejar
    ck_opener = urllib2.build_opener(handler)
    resp = ck_opener.open(req) 
    
    # print(resp.headers)
    # for i in cookiejar:
        # print(i)    
    
    req = urllib2.Request(url_somebody)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0')
    # print(req.headers)  #{'User-agent': 'xxx'}
    # fiddler 抓包:
    """
    GET https://xueqiu.com/u/6146070786 HTTP/1.1
    Host: xueqiu.com
    Accept-Encoding: identity
    User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0
    Cookie: remember=1; xq_is_login=1; xq_a_token.sig=xxx...
    Connection: close
    """
    
    
    # resp = ck_opener.open(req)   
    #安裝opener,此后調用urlopen()時都會使用安裝過的opener對象
    urllib2.install_opener(ck_opener)    
    resp = urllib2.urlopen(req)
    
    html = resp.read()
    assert os.getenv('xueqiu_nickname') in html
    # with open('login_xueqiu.html','wb') as f:
        # f.write(html)
    # assert u'登錄' not in html.decode('utf-8')

if __name__ == '__main__':
    
    login_xueqiu()    
    

4.構造cookiejar

4.1 從cookie字符串生成cookiejar

def get_cookjar_from_cookie_str(cookie, domain, path='/'):
 
    cookiejar = cookielib.CookieJar()
    simple_cookie = Cookie.SimpleCookie(cookie)
     
    # 上述SimpleCookie不能直接使用,因為一個完整的Cookie,還必須包括額外的字段,如:domain、path、expires等。
    # 第二步工作是創建cooklib.Cookie對象,直接將key, value傳入cooklib.Cookie類的構造函數即可得到
    # 一系列cookielib.Cookie對象,便可以依次用它們來更新CookieJar了。
    
    for c in simple_cookie:
        cookie_item = cookielib.Cookie(
            version=0, name=c, value=str(simple_cookie[c].value),
                     port=None, port_specified=None,
                     domain=domain, domain_specified=None, domain_initial_dot=None,
                     path=path, path_specified=None,
                     secure=None,
                     expires=None,
                     discard=None,
                     comment=None,
                     comment_url=None,
                     rest=None,
                     rfc2109=False,
            )
        cookiejar.set_cookie(cookie_item)
    return cookiejar  

4.2 解析瀏覽器cookie文件生成cookiejar (借助sqlite3, win32crypt.CryptUnprotectData)

def parse_browser_cookie_file(browser='chrome', domain=None):
    cookie_file_path_temp = 'cookies_temp'
    if browser == 'chrome':
        # 'C:\\Users\\win7\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Cookies'
        cookie_file_path = os.path.join(os.environ['LOCALAPPDATA'], r'Google\Chrome\User Data\Default\Cookies')
    elif browser == 'firefox':
        # r'C:\Users\win7\AppData\Roaming\Mozilla\Firefox\Profiles\owmkid1w.default\cookies.sqlite'
        # cookie_file_path = os.path.join(os.environ['APPDATA'], r'Mozilla\Firefox\Profiles\owmkid1w.default\cookies.sqlite')
        firefox_dir_path = os.path.join(os.environ['APPDATA'], r'Mozilla\Firefox\Profiles')
        result = []
        for path in os.listdir(firefox_dir_path):
            path = os.path.join(firefox_dir_path, path, 'cookies.sqlite')
            if os.path.exists(path):
                result.append(path)
        # 存在幾個 xxx.default 文件夾,選擇其中文件最大的
        cookie_file_path = sorted(result, key=lambda x: os.stat(x).st_size, reverse=True)[0]        

        
    if not os.path.exists(cookie_file_path):
        raise Exception('Cookies file not exist!')
        

    # os.system('copy "%s" D:\\python-chrome-cookies'%cookie_file_path)  #出現空格 不能省略""不能寫成 D:/
    # os.system('copy %s %s'%('d:\\123.txt','e:\\123.txt'))
    # sqlite3.OperationalError: database is locked
    shutil.copy(cookie_file_path, cookie_file_path_temp)  #'d:/cookies'
    conn = sqlite3.connect(cookie_file_path_temp)
    c = conn.cursor()

    # 或者右鍵選擇打開方式 SQLiteSpy.exe 
    # python sqlite3查看數據庫所有表(table) 
    # http://www.cnblogs.com/doudongchun/p/3694803.html    
        # In [139]: c = conn.cursor()
    # 查看某數據庫中所有表
        # In [140]: c.execute("select name from sqlite_master where type='table' order by name")
        # Out[140]: <sqlite3.Cursor at 0x9648d50>
        # In [141]: print c.fetchall()
        # [(u'cookies',), (u'meta',)]  
    # 查看表結構
        # In [148]: c.execute("PRAGMA table_info('cookies')")
        # Out[148]: <sqlite3.Cursor at 0x9648d50>
        # In [149]: print c.fetchall()
        # [(0, u'creation_utc', u'INTEGER', 1, None, 1), (1, u'host_key', u'TEXT', 1, None, 0), (2, u'name', u'TEXT', 1, None, 0), (3, u'value', u'TEXT', 1, None, 0), (4, u'path', u'TEXT', 1, None, 0), (5, u'expires_utc', u'INTEGER', 1, None, 0), (6, u'secure', u'INTEGER', 1, None, 0), (7, u'httponly', u'INTEGER', 1, None, 0), (8, u'last_access_utc', u'INTEG
        # ER', 1, None, 0), (9, u'has_expires', u'INTEGER', 1, u'1', 0), (10, u'persistent', u'INTEGER', 1, u'1', 0), (11, u'priority', u'INTEGER', 1, u'1', 0), (12, u'encrypted_value', u'BLOB', 0, u"''", 0), (13, u'firstpartyonly', u'INTEGER', 1, u'0', 0)]  
    
    
    # Python 爬蟲解決登錄問題的另類方法
    # https://jecvay.com/2015/03/python-chrome-cookies.html
    # (12, u'encrypted_value', u'BLOB', 0, u"''", 0)  倒數第二個數據被加密   <read-write buffer ptr 0x00000000093FD188, size 230 at 0x00000000093FD150>
    # In [177]: sql = 'select * from cookies  where host_key like "%xueqiu.com%"'
    # [(13119251368696887L, u'.xueqiu.com', u's', u'', u'/', 13150787368696887L, 0, 1, 13146029373373314L, 1, 1, 1, <read-write buffer ptr 0x00000000093FD188, size 230 at 0x00000000093FD150>, 0)
    
    # Chrome 33+瀏覽器 Cookies encrypted_value解密腳本(python實現)
    # http://www.ftium4.com/chrome-cookies-encrypted-value-python.html
    # Chrome瀏覽器版本33以上對Cookies進行了加密,用SQLite Developer打開Chrome的Cookies文件就會發現,
    # 原來的value字段已經為空,取而代之的是加密的encrypted_value。
   
    c.execute("select name from sqlite_master where type='table' order by name")   
    print c.fetchall()
    
    if browser == 'chrome':
        sql = 'select host_key, name, encrypted_value, path from cookies'
        if domain:
            sql += ' where host_key like "%{}%"'.format(domain)
    elif browser == 'firefox':
        sql = 'select host, name, value, path from moz_cookies'
        if domain:
            sql += ' where host like "%{}%"'.format(domain)    
    

    cookie_dict = {}
    cookiejar = cookielib.CookieJar()
    
    # rst=c.execute(sql)
    # type(rst) #sqlite3.Cursor
    for row in c.execute(sql):    # conn.execute(sql) 不標准   
        # print type(row  #<type 'tuple'>
        if browser == 'chrome':
            ret = win32crypt.CryptUnprotectData(row[2], None, None, None, 0)
            value = ret[1].decode()
        elif browser == 'firefox':
            value = row[2]
        cookie_dict[row[1]] = value
            
        cookie_item = cookielib.Cookie(
            version=0, name=row[1], value=value,
                     port=None, port_specified=None,
                     domain=row[0], domain_specified=None, domain_initial_dot=None,
                     path=row[3], path_specified=None,
                     secure=None,
                     expires=None,
                     discard=None,
                     comment=None,
                     comment_url=None,
                     rest=None,
                     rfc2109=False,
            )
        cookiejar.set_cookie(cookie_item)    # Apply each cookie_item to cookiejar
    # print cookie_dict
    conn.close()
    os.remove(cookie_file_path_temp)
    
    
    cookie_str = ';'.join(['%s=%s'%(k,v) for k,v in cookie_dict.items()])
    return (cookiejar, cookie_dict, cookie_str)  

5.使用cookie字符串或cookiejar

5.1 在 urllib2.urlopen(req) 中req.add_header('Cookie',復制的cookie字符串)

import ssl
context = ssl._create_unverified_context()

def urllib2_Request_with_cookie_str(url, cookie, verify):
    cookie = re.sub('\n', '', cookie)
    # urllib2.Request(self, url, data=None, headers={}, origin_req_host=None, unverifiable=False)    
    req = urllib2.Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0')      
    req.add_header('Cookie',cookie)
    
    # urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>
    try:
        resp = urllib2.urlopen(req) 
    except urllib2.URLError as err:
        print err
        resp = urllib2.urlopen(req, context=context)  # 同時打開fiddler的影響  
        
    html_doc = resp.read()
    with open('urllib2_Request_with_cookie_str.html','wb') as f:
        f.write(html_doc)
    print 'urllib2_Request_with_cookie_str', url, verify, verify in html_doc   

 

5.2 requests.get 傳參 cookies=cookie字符串/dict/cookiejar

import Cookie
# cookie 接受類型:str, dict, cookiejar
def requests_with_cookie(url, cookie, verify):

    # requests cookie 接受 dict 或 cookiejar,需要將字符串轉dict
    if isinstance(cookie, basestring):
        if isinstance(cookie, unicode):
            cookie = cookie.encode('utf-8')
        cookie = re.sub('\n', '', cookie)
        # SimpleCookie supports strings as cookie values.
        simple_cookie = Cookie.SimpleCookie(cookie)
        cookie = dict([(c, simple_cookie[c].value) for c in simple_cookie])
        # 10行代碼爬取微信公眾號文章評論 
        # https://mp.weixin.qq.com/s/Qbeyk2hncKDaz1iT54iwTA
        # 把這段 Cookie 數據拷貝出來,用 Python 構建一個 Cookie 對象,給 requests 使用。
        # simple_cookie = Cookie.SimpleCookie(cookie)
        # from http.cookies import SimpleCookie
        # simple_cookie = SimpleCookie(cookie)
        
    
    # 字典最后一項多出逗號也無妨
    headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0",
    }

    # requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
    # https://stackoverflow.com/questions/10667960/python-requests-throwing-up-sslerror
    # 簡單直接的解決辦法 verify=False
    try:
        r = requests.get(url, headers=headers, cookies=cookie)
    except requests.exceptions.SSLError as err:
        print err
        r = requests.get(url, headers=headers, cookies=cookie, verify=False)

    print 'requests_with_cookie', url, verify, verify in r.content
    with open('requests_with_cookie.html','wb') as f:
        f.write(r.content)

5.3 urllib2.build_opener傳入 urllib2.HTTPCookieProcessor(cookiejar)

import ssl
#ssl._create_default_https_context = ssl._create_unverified_context
context = ssl._create_unverified_context()
def opener_with_cookiejar(url, cookie, verify): req = urllib2.Request(url) req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0') # handler = urllib2.HTTPCookieProcessor(cookie) # opener = urllib2.build_opener(handler) # 參考添加 context 參數, 否則得用全局ssl設置 # C:\Program Files\Anaconda2\Lib\urllib2.py # def urlopen # elif context: # https_handler = HTTPSHandler(context=context) # opener = build_opener(https_handler) # urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)> try: opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar)) resp = opener.open(req) except urllib2.URLError as err: print err opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar), urllib2.HTTPSHandler(context=context)) #疊加多個 handler resp = opener.open(req) html_doc = resp.read() with open('opener_with_cookiejar.html','wb') as f: f.write(html_doc) print 'opener_with_cookiejar', url, verify, verify in html_doc

5.4 更加底層 httplib.HTTPConnection 傳入 cookie字符串

import httplib
import urlparse
 
# 不需要用到 import ssl 的設置!!!速度快!!!
def httplib_conn_with_cookie_str(url, cookie, verify):
    # url = 'https://xueqiu.com'
    url_ori = url
    cookie = re.sub('\n', '', cookie)
    ret = urlparse.urlparse(url)    # Parse input URL
    if ret.scheme == 'http':
        conn = httplib.HTTPConnection(ret.netloc)
    elif ret.scheme == 'https':
        conn = httplib.HTTPSConnection(ret.netloc)
        
    url = ret.path
    if ret.query: url += '?' + ret.query
    if ret.fragment: url += '#' + ret.fragment
    if not url: url = '/'
    print url
    conn.request(method='GET', url=url , headers={'Cookie': cookie})
    
    # 如果傳入url = 'https://xueqiu.com' ,返回內容為:
    # Redirecting to <a href="/4xxxxxxxxx/">/4xxxxxxxxx/</a>.
    # 卻沒有處理重導向!
    resp = conn.getresponse()
    html_doc = resp.read()
    with open('httplib_conn_with_cookie_str.html','wb') as f:
        f.write(html_doc)
    print 'httplib_conn_with_cookie_str', url_ori, verify, verify in html_doc

6.第三方庫

https://pypi.python.org/pypi/browser-cookie3/0.6.1

https://pypi.python.org/pypi/browsercookie/0.7.2


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM