為什么要獲取cookie?
因為有的頁面爬取的時候,需要登錄后才能爬,比如知乎,如何判斷一個頁面是否已經登錄,通過判斷是否含有cookies就可以,我們獲取到cookie后就可以攜帶cookie來訪問需要登錄后的頁面了。
方式一使用session
這里的session並不是django中的session,而是requests中的session
import requests url = 'https://www.processon.com/login' login_email = '283867@qq.com' login_password = 'ZZZ0' # 創建一個session,作用會自動保存cookie session = requests.session() data = { 'login_email': login_email, 'login_password': login_password } # 使用session發起post請求來獲取登錄后的cookie,cookie已經存在session中 response = session.post(url = url,data=data) # 用session給個人主頁發送請求,因為session中已經有cookie了 index_url = 'https://www.processon.com/diagrams' index_page = session.get(url=index_url).text print(index_page)
把cookie保存在本地,並判斷用戶是否已經登錄
import requests from http import cookiejar # 創建一個session,作用會自動保存cookie session = requests.session() # 指定cookie保存的路徑 session.cookies = cookiejar.LWPCookieJar(filename="cookies.txt") try: session.cookies.load(ignore_discard=True) # 加載cookie文件,ignore_discard = True,即使cookie被拋棄,也要保存下來 except: print('cookie未能加載') def login_save_cookie(): """ 登錄並保存cookie到本地 :return: """ url = 'https://www.processon.com/login' login_email = '*****@qq.com' login_password = '****1391' data = { 'login_email': login_email, 'login_password': login_password } # 使用session發起post請求來獲取登錄后的cookie,cookie已經存在session中 response = session.post(url=url, data=data) # 把cookie保存到文件中 session.cookies.save() def read_cookie(): """ 讀取cookie進入登錄后的頁面 :return: """ index_url = 'https://www.processon.com/diagrams' index_page = session.get(url=index_url).text print(index_page) def login_y_n(): """ 判斷用戶是否已經登錄,我們這里使用的方法是:隨便找一個登陸后頁面的url,如果我們訪問它時不發生重定向,我們就可以 判斷該用戶應經登錄了 :return: """ url = 'https://www.processon.com/diagrams/new#template' response = session.get(url = url,allow_redirects=False) # allow_redirects =False不允許重定向到登錄頁面 if response != 200: return False else: return True read_cookie()
方法二 使用selenium獲取cookies
from selenium import webdriver import json browser = webdriver.Chrome(executable_path=r"E:\爬蟲視頻\day04\chromedriver_win32_2.46\chromedriver.exe") def get_cookies(): """ 通過selenium獲取cookie保存在文件中 :return: """ url = 'https://www.processon.com/login' browser.get(url=url) browser.find_element_by_id('login_email').send_keys('286867@qq.com') browser.find_element_by_id('login_password').send_keys('ZZZ0391') browser.find_element_by_id('signin_btn').click() # 獲取cookie,這里得到的是一個列表 cookies_list = browser.get_cookies() browser.close() with open("cookies.txt", "w") as fp: json.dump(cookies_list, fp) # 這里切記,如果我們要使用json.load讀取數據,那么一定要使用json.dump來寫入數據, # 不能使用str(cookies)直接轉為字符串進行保存,因為其存儲格式不同。這樣我們就將cookies保存在文件中了。 def read_cookie(): """ 讀取cookie,添加到browser中 :return: """ url = 'https://www.processon.com/diagrams' browser.get(url=url) # 這里必須先訪問一次否則頁面不能打開 with open('./cookies.txt','r') as fp: cookies_list = json.load(fp) for cookies in cookies_list: browser.add_cookie(cookies) browser.get(url) read_cookie()
注意用selenium來獲取的cookie是一個列表,列表中有很多字典,字典中有domain、expiry、name、value、path等key,但是在我們真正的瀏覽器中就只有一個字典,字典中只有name 鍵對應的值和value對應的值,所以在使用的時候
還需要轉換一下:

[{"domain": ".processon.com", "expiry": 1560351255.689168, "httpOnly": false, "name": "_sid", "path": "/", "secure": false, "value": "796afe66ce2a6002828ab3ca281f96fb"}, {"domain": ".processon.com", "httpOnly": true, "name": "JSESSIONID", "path": "/", "secure": false, "value": "EBDACE1BDAB1464A2CCBBFFB7048A238.jvm1"}, {"domain": ".processon.com", "expiry": 1586703257, "httpOnly": false, "name": "zg_did", "path": "/", "secure": false, "value": "%7B%22did%22%3A%20%2216a173113351bb-062c441b2e33b7-7a1437-144000-16a173113362e%22%7D"}, {"domain": ".processon.com", "expiry": 1560351255.689117, "httpOnly": false, "name": "processon_userKey", "path": "/", "secure": false, "value": "59f7fba9e4b0edf0e25cd413"}, {"domain": ".processon.com", "expiry": 1555167313, "httpOnly": false, "name": "_gat", "path": "/", "secure": false, "value": "1"}, {"domain": ".processon.com", "expiry": 1555253657, "httpOnly": false, "name": "_gid", "path": "/", "secure": false, "value": "GA1.2.1345294219.1555167253"}, {"domain": ".processon.com", "expiry": 1618239257, "httpOnly": false, "name": "_ga", "path": "/", "secure": false, "value": "GA1.2.555498451.1555167253"}, {"domain": ".processon.com", "expiry": 1586703257, "httpOnly": false, "name": "zg_3f37ba50e54f4374b9af5be6d12b208f", "path": "/", "secure": false, "value": "%7B%22sid%22%3A%201555167253312%2C%22updated%22%3A%201555167257424%2C%22info%22%3A%201555167253326%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform %22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%2259f7fba9e4b0edf0e25cd413%22%7D"}]

Cookie: zg_did=%7B%22did%22%3A%20%2216a16fc24ab1e7-08f589794c6e4d-7a1437-144000-16a16fc24ac76a%22%7D; _ga=GA1.2.1095343087.1555163784; _gid=GA1.2.545489346.1555163784; processon_userKey=59f7fba9e4b0edf0e25cd413; _sid=796afe66ce2a6002828ab3ca281f96fb; zg_3f37ba50e54f4374b9af5be6d12b208f=%7B%22sid%22%3A%201555163784372%2C%22updated%22%3A%201555165807015%2C%22info%22%3A%201555163784376%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%2259f7fba9e4b0edf0e25cd413%22%7D; JSESSIONID=685AABAF6B8D70AF25E501C7E9E67A32.jvm1