requests，lxml爬啟信寶

本文轉載自查看原文 2018-03-16 09:34 1224 python/ Python

首先，

添加requests模塊：

然后，

添加lxml模塊：

啟信寶登錄抓包：

QiXinBao.py：

import requests
from lxml import etree


loginUrl = "https://www.qixin.com/api/user/login"
# 啟信寶登錄接口
homePage = "https://www.qixin.com"
# 啟信寶首頁

headers = {"Accept": "application/json, text/plain, */*",
           "Accept-Encoding": "gzip, deflate, br",
           "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
           "Content-Length": "66",
           "Content-Type": "application/json;charset=utf-8",
           "Host": "www.qixin.com",
           "Referer": "https://www.qixin.com/auth/login?return_url=%2Fnew-vip",
           "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0",
           "X-Requested-With": "XMLHttpRequest",
           "dc49417fe4f34f86b0fe": "44282ce68be84e73f8eb4d2a4d4b32c02e8e84970160b2d6829c6b8a5380483e50ec708bc38040dd715d283dfac3123cf422ecff2fe4977c8624e457c5046959"
           }
# 請求頭（偽裝成瀏覽器）
parameter = {"acc": "13688888888", "pass": "000000", "captcha": {"isTrusted": True}}
# 請求體

session = requests.Session()
# 保持會話
response_1 = session.post(loginUrl, headers=headers, json=parameter, timeout=5)
# 登錄
print(response_1.status_code)
# 打印響應碼

response_2 = session.get(homePage).content
# 打開啟信寶首頁
page_2 = etree.HTML(response_2)
link = page_2.xpath("//html/body/div[1]/div[4]/div/div[2]/div/div[1]/div[1]/a//@href")
companyUrl = homePage+link[0]
# 獲取第一家公司的URL

response_3 = session.get(companyUrl).content
# 打開第一家公司
page_3 = etree.HTML(response_3)
companyName = page_3.xpath("//html/body/div[6]/div/div[2]/div/div/h4//text()")
# 獲取公司名稱
code_1 = page_3.xpath("//*[@id='icinfo']/table/tbody/tr[1]/td[2]//text()")
# 獲取統一社會信用代碼
code_2 = page_3.xpath("//*[@id='icinfo']/table/tbody/tr[2]/td[2]//text()")
# 獲取注冊號

print(companyName[0]+"\n"+code_1[0]+"\n"+code_2[0])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 selenium + ChromeDriver 實戰系列之啟信寶（一） requests的content與text導致lxml的解析問題 lxml的使用（節點與xpath爬取數據） centos6裝python3，並安裝requests, lxml和beautifulsoup模塊 Python爬蟲常用庫介紹（requests、BeautifulSoup、lxml、json） lxml xpath 爬取並正常顯示中文內容 python筆記2--lxml.etree爬取html內容 python筆記28-lxml.etree爬取html內容 Python之requests請求，獲取cookies，Requests帶cookies爬取 requests庫爬取需要登錄的網站