在使用requests时,因为有的网站需要登录验证码,而我们又不能通过验证码识别程序,那么这个时候就要借助selenium手动登录,然后获取cookie共享给requests使用。
1.在未登录的情况下,使用requests爬取我们想要的东西
# -*- coding:utf-8
import requests def crawler(): sess = requests.Session() url = 'http://210.74.4.127:8888/jes/jptInputInvoiceQuery.ajax?ssId=JPT&_dc=1564026411786' data = {'page': 1, 'start': 0, 'limit': 10} html = sess.post(url, data).text print(html) if __name__ == "__main__": crawler()
运行后发现需要登录:
但是由于页面有验证码,只能通过selenium获取cookie,然后共享给requests使用:
#!/usr/bin/env python # -*- encoding: utf-8 -*- from selenium import webdriver import requests import time def getCookies(): # 设置浏览器默认存储地址 options = webdriver.ChromeOptions() # options.add_argument('--headless') driver = webdriver.Chrome(options=options) driver.maximize_window() driver.get("http://210.74.4.127:8888/jes/login.html") # 输入用户名 driver.find_element_by_id("userId").send_keys("cheng") # 输入密码 driver.find_element_by_id("password").send_keys("admin123") # 等待拖拽验证码 time.sleep(10) # 点击提交 driver.find_element_by_css_selector("input.log_button").click() # 获取cookie cookies = driver.get_cookies() driver.close() return cookies def crawler(): sess = requests.Session() sess.headers.clear() # 将selenium的cookies放到session中 for cookie in getCookies(): sess.cookies.set(cookie['name'], cookie['value']) url = 'http://http://210.74.4.127:8888/jes/jptInputInvoiceQuery.ajax?ssId=JPT&_dc=1564026411786' data = {'page': 1, 'start': 0, 'limit': 10} html = sess.post(url, data).text print(html) if __name__ == "__main__": crawler()
运行结果: