有道詞典的web接口,實際上可以用爬蟲模擬,輸入key,拼接為有道詞典接口的formdata,爬取返回值,實際為Ajax動態生成的translation,這樣外部來看實現了翻譯接口的模擬,相當於爬蟲模擬瀏覽器調用了有道詞典web接口,其實講真的話來說,直接調用有道web接口,傳json參數就可以了,不用這么費事,但爬蟲模擬了人登陸web,輸入關鍵詞,獲得翻譯結果的過程。
瀏覽器輸入操作,解析有道詞典翻譯的web接口url和格式
#爬蟲模擬調用有道詞典web接口調用 from urllib import request from urllib import parse import re class YoudaoTranslator: def __init__(self, key): self.key = key def __getData(self): # 構造 有道詞典web接口所需的Form data formdata = { "i": self.key, "from": "AUTO", "to": "AUTO" , "smartresult": "dict", "client": "fanyideskweb", "salt": "15763837022114", "sign": "2b12fd214e066f53bc3455a126d7a509", "ts": "1576383702211", "bv": "5575008ba9785f184b106838a72d6536", "doctype": "json", "version": "2.1", "keyfrom": "fanyi.web", "action": "FY_BY_REALTlME" } data = parse.urlencode(formdata).encode(encoding="utf-8") return data def __getPage(self): #獲得模擬瀏覽器請求,獲得Ajax返回值 header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36"} url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule" req = request.Request(url, data=YoudaoTranslator.__getData(self), headers=header) res = request.urlopen(req).read().decode() return res def __Pat(self): #解析ajax返回json字符串,正則匹配獲取翻譯值 pat = r'"tgt":"(.*?)"}]]' result = re.findall(pat, YoudaoTranslator.__getPage(self)) print(result[0]) return result def Translator(self): YoudaoTranslator.__Pat(self) if __name__ == '__main__': i = YoudaoTranslator("人格心理學") i.Translator()
然后是運行結果