【python】:用爬蟲腳本爬取招聘網站上的信息

本文轉載自查看原文 2019-09-12 11:37 755

方法：

1，一個招聘只為下，會顯示多個頁面數據，依次把每個頁面的連接爬到url；

2，在page_x頁面中，爬到15條的具體招聘信息的s_url保存下來；

3，打開每個s_url鏈接，獲取想要的信息例如，title，connect，salary等；

4，將信息保存並輸入到csv文本中去。

代碼：

from lxml import etree
import requests
import time
#要爬取的網站鏈接
url = "https://www.lagou.com/zhaopin/Java/?labelWords=label"
#設置信息頭，模擬人為操作，可以避免一些反爬蟲
head = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3534.4 Safari/537.36'}

res = requests.get(url, headers=head).content.decode("utf-8")
re = etree.HTML(res)
#獲得該頁面翻頁地址鏈接
s_url = re.xpath("//div[@class='pager_container']/a[position()>2 and position()<7]/@href")
print('s_url=', s_url)
#依次循環page1，page2等等
for x in s_url:
    res = requests.get(x, headers=head).content.decode("utf-8")
    re = etree.HTML(res)
    print('x==', x)
    #獲取當前頁面下的所有招聘信息鏈接
    list_url = re.xpath("//div[@class='s_position_list ']/ul/li[position()>=0 and position()<15]/div/div[1]/div/a/@href")
    print('list_url=', list_url)
    #依次循環每個招聘信息，將標題，內容，薪資獲取到
    for y in list_url:
        r01 = requests.get(y, headers=head).content.decode("utf-8")
        html01 = etree.HTML(r01)
        print('y==', y)
        
        title = html01.xpath("string(//div[@class='job-name'])")
        print('title===', title)
        content = html01.xpath("string(//div[@class='job-detail'])")
        print('content===', content)
        salary = html01.xpath("string(/html/body/div[5]/div/div[1]/dd/h3/span[1])")
        print('salary===', salary)
        #設置休眠是防止網站識別自己，最好是random休眠
        time.sleep(5)
        # 保存爬蟲信息內容
        with open("cn-blog.csv", "a+", encoding="utf-8") as file:
            file.write(title + "\n")
            file.write(content + "\n")
            file.write(salary + "\n")
            file.write("*" * 50 + "\n")

總結：

1，設置head信息以及sleep，防止網站識別自己（雖然網站還是會屏蔽些，但是也能抓取大部分數據了）；

2，用xpath獲取同一個元素下所有內容，用下標[position()>x and position()<y]表示；

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬取招聘網站信息爬取某招聘網站的信息 Python爬取前程無憂網站上python的招聘信息『Scrapy』爬取騰訊招聘網站 Python爬取招聘網站數據，並可視化展示招聘需求、薪資、招聘人數等數據第6章通過CrawlSpider對招聘網站進行整站爬取 Python爬蟲爬取智聯招聘職位信息 Python爬取拉鈎招聘網數據 Python3.5：爬取網站上電影數據