Python爬蟲實踐 —— 3.利用爬蟲提取返回值,模擬有道詞典接口


有道詞典的web接口,實際上可以用爬蟲模擬,輸入key,拼接為有道詞典接口的formdata,爬取返回值,實際為Ajax動態生成的translation,這樣外部來看實現了翻譯接口的模擬,相當於爬蟲模擬瀏覽器調用了有道詞典web接口,其實講真的話來說,直接調用有道web接口,傳json參數就可以了,不用這么費事,但爬蟲模擬了人登陸web,輸入關鍵詞,獲得翻譯結果的過程。

瀏覽器輸入操作,解析有道詞典翻譯的web接口url和格式

#爬蟲模擬調用有道詞典web接口調用
from urllib import request
from urllib import parse
import re


class YoudaoTranslator:

    def __init__(self, key):
        self.key = key

    def __getData(self):
        # 構造 有道詞典web接口所需的Form data
        formdata = {
            "i": self.key,
            "from": "AUTO",
            "to": "AUTO" ,
            "smartresult": "dict",
            "client": "fanyideskweb",
            "salt": "15763837022114",
            "sign": "2b12fd214e066f53bc3455a126d7a509",
            "ts": "1576383702211",
            "bv": "5575008ba9785f184b106838a72d6536",
            "doctype": "json",
            "version": "2.1",
            "keyfrom": "fanyi.web",
            "action": "FY_BY_REALTlME"
        }
        data = parse.urlencode(formdata).encode(encoding="utf-8")
        return data

    def __getPage(self):
        #獲得模擬瀏覽器請求,獲得Ajax返回值
        header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36"}
        url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"

        req = request.Request(url, data=YoudaoTranslator.__getData(self), headers=header)
        res = request.urlopen(req).read().decode()
        return res

    def __Pat(self):
        #解析ajax返回json字符串,正則匹配獲取翻譯值
        pat = r'"tgt":"(.*?)"}]]'
        result = re.findall(pat, YoudaoTranslator.__getPage(self))
        print(result[0])
        return result

    def Translator(self):
        YoudaoTranslator.__Pat(self)


if __name__ == '__main__':

    i = YoudaoTranslator("人格心理學")
    i.Translator()

然后是運行結果


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM