企查查簡單爬蟲


經歷過企查查這個網站后,強烈感覺到使用抓包的重要性,以至於決定從此以后使用抓包進行模擬請求,放棄使用F12進行分析。

寫下這篇文章,奠基死去的F12~~~

 1 import requests
 2 from lxml import etree
 3 
 4 url = "https://www.qcc.com/search?key=%E5%A4%A9%E6%B4%A5%E6%BB%A8%E6%B5%B7%E6%96%B0%E5%8C%BA"
 5 
 6 hed = {
 7     "host": "www.qcc.com",
 8     "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
 9     "upgrade-insecure-requests": "1",
10     "cookie": "QCCSESSID=vpk1mpc45ci95eu83etg528881; zg_did=%7B%22did%22%3A%20%221732cdcac86bf-0039dd6baef69a-4353761-100200-1732cdcac8844f%22%7D; UM_distinctid=1732cdcb0a713b-01b058b949aa5a-4353761-100200-1732cdcb0ab44e; hasShow=1; _uab_collina=159418552807339394444789; acw_tc=7d27c71c15941953776602556e6b8442bc8001e4e1270e8fead4b79557; CNZZDATA1254842228=1092104090-1594185078-https%253A%252F%252Fwww.baidu.com%252F%7C1594195878; Hm_lvt_78f134d5a9ac3f92524914d0247e70cb=1594194111,1594195892,1594195918,1594196042; Hm_lpvt_78f134d5a9ac3f92524914d0247e70cb=1594196294; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201594185526424%2C%22updated%22%3A%201594196294349%2C%22info%22%3A%201594185526455%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%5C%22%24utm_source%5C%22%3A%20%5C%22baidu1%5C%22%2C%5C%22%24utm_medium%5C%22%3A%20%5C%22cpc%5C%22%2C%5C%22%24utm_term%5C%22%3A%20%5C%22pzsy%5C%22%7D%22%2C%22referrerDomain%22%3A%20%22www.baidu.com%22%2C%22cuid%22%3A%20%22fd05f1ac2b561244aaa6b27b3bb617a4%22%7D",
11 }
12 
13 resq = requests.get(url = url,headers = hed).content
14 response = etree.HTML(resq)
15 
16 title_list = []
17 title = response.xpath('//*[@id="search-result"]//tr/td[3]/a//text()')
18 for tit in title:
19     tit = tit.replace(',','').strip()
20     title_list.append(tit)
21 
22 addr_list = []
23 addrs = response.xpath('//*[@id="search-result"]//tr/td[3]/p[4]//text()')
24 for addr in addrs:
25     addr = addr.replace(',','').strip()
26     addr_list.append(addr)
27 
28 print(title_list)
29 print(addr_list)

代碼很簡單,甚至於簡陋,為什么要記錄下這個爬蟲,因為請求頭部信息,自己進行分析,和ctrl+c+v導致請求頭數據不准確,嚴重感覺到抓包工具的請求分析更加快速有效。

繼續加油,繼續努力


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM