分析智聯招聘的API接口，進行數據爬取

本文轉載自查看原文 2019-02-26 14:18 3907 python

一丶簡介

現在的網站基本上都是前后端分離的，前端的你看到的數據，基本上都不是HTML上的和數據，都是通過后端語言來讀取數據庫服務器的數據然后動態的加載數據到前端的網頁中。

然后自然而然的然后隨着ajax技術的出現，前端的語言也可以實現對后端數據庫中的數據進行獲取，然后就出現了api接口這一說法。簡單的說就是通過特定的參數和地址來對某一網站的某個接口進行數據的獲取。

一般api接口獲取到的數據都是json的，就算不是接送的數據，也是又規律，又秩序的數據。對於這些數據進行分析，那是非常簡單的。

這也只是本人的一個小小的看法和簡單的理解。

二丶分析

進入到智聯招聘的官方網站中，按F12進入到開發者模式中。從數據的加載中可以很輕易的找到三個api接口

第一個API接口

https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity=合肥

參數	作用
輸入你要的查詢的城市的名稱	會使返回的結果有按城市的編碼（code）

第二個API接口

https://dict.zhaopin.cn/dict/dictOpenService/getDict?dictNames=region_relation,education,recruitment,education_specialty,industry_relation,careet_status,job_type_parent,job_type_relation

參數值	return—result（code）
region_relation	地區信息
education	學歷信息
recruitment	招聘信息（是否統招）
education_specialty	職業類別
industry_relation	行業
careet_status	到崗狀態
job_type_parent	職位類別
job_type_relation	職位

第三個API接口

https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId=664&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3

這個API接口的值都是在上面兩個接口中獲取到的代碼，

參數	作用
pageSize	獲取的數據的大小
cityId	城市
workExperience	工作經驗
education	學歷
companyType	公司性質
employmentType	職位類型
jobWelfareTag	工作福利
kw	關鍵字
kt	值可變，作用暫時不明，參數不能少

三丶數據爬取

現在API接口都已經找到了，就是數據的獲取和本地的存儲了。

爬取數據的目標

根據輸入城市來進行數據的查詢和存儲，本次數據只查找python的工作崗位

每個職位信息中都有很多的字段信息，為了方便我就只提取幾個字段，方法相同

全部代碼：

"""
本次的數據爬取只做簡單的反爬蟲預防策略
"""
import requests
import os
import json

class siper(object):
    def __init__(self):
        self.header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
            "Origin":"https://sou.zhaopin.com",
            "Host":"fe-api.zhaopin.com",
            "Accept-Encoding":"gzip, deflate, br"
        }
        print("職位查詢程序開始······")
        # 打開文件
        self.file = "result.json"
        path = os.getcwd()
        pathfile = os.path.join(path,self.file)
        self.fp = open(pathfile,"w",encoding="utf-8")
        self.fp.write("[\n")

    def get_response(self,url):
        return requests.get(url=url,headers = self.header)

    def get_citycode(self,city):
        url = "https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity={}".format(city)
        response = self.get_response(url)
        result = json.loads(response.text)
        return result['data']['code']

    def parse_data(self,url):
        response = self.get_response(url)
        result = json.loads(response.text)['data']['results']
        items = []
        for i in result:
            item = {}
            item['職位'] = i['jobName']
            item['工資'] = i['salary']
            item['招聘狀態'] = i['timeState']
            item['經驗要求'] = i['workingExp']['name']
            item['學歷要求'] = i['eduLevel']['name']
            items.append(item)
        return items

    def save_data(self,items):
        num = 0
        for i in items:
            num = num + 1
            self.fp.write(json.dumps(i,ensure_ascii=False))
            if num == len(items):
                self.fp.write("\n")
            else:
                self.fp.write(",\n")
            print("%s--%s"%(str(num),str(i)))

    def end(self):
        self.fp.write("]")
        self.fp.close()
        print("職位查詢程序結束······")
        print("數據已寫入到{}文件中······".format(self.file))

    def main(self):
        try:
            cityname = input("請輸入你要查詢的城市的名稱（市級城市）：")
            city = self.get_citycode(cityname)
            url = "https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId={}&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3".format(
                city)
            items = self.parse_data(url)
            self.save_data(items)
            self.end()
        except Exception as e:
            print("城市輸入錯誤！！！（強制退出程序）")
            print(e)
            exit(0)


if __name__ == '__main__':
    siper = siper()
    siper.main()

執行結果：

執行結果文件：

四丶總結

這個程序的邏輯和代碼的書寫都比較簡單，屬於爬蟲的基礎內容，比較復雜的就是API接口的尋找。

本文中的接口返回的就是json文件，所以數據的分析部分比較簡單，運用python中的json模塊，可以很快的將數據分析出來。

本人也是python爬蟲數據分析的入門學生，希望和大家一起學習一起進步，

本文中的內容屬於學習使用，不用於商業盈利。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬取智聯招聘 Python項目：爬取智聯招聘網站的數據分析職位信息並進行可視化分析 Python爬蟲爬取智聯招聘職位信息 python3 requests_html 爬取智聯招聘數據（簡易版） python爬取智聯招聘職位信息（單進程） Python+selenium爬取智聯招聘的職位信息智聯招聘爬蟲源碼分析(一) 用Python爬取智聯招聘信息做職業規划爬蟲再探實戰（一）——爬取智聯招聘職位信息抓取智聯招聘