京東某商品的頁面爬取:
全代碼如下(使用通用框架進行爬取):
1 import requests 2 url = "https://item.jd.com/2967929.html" 3 try: 4 r = requests.get(url) 5 r.raise_for_status() 6 r.encoding = r.apparent_encoding 7 print(r.text[:1000]) 8 except: 9 print("爬取失敗")
輸出:
1 <!DOCTYPE HTML> 2 <html lang="zh-CN"> 3 <head> 4 <!-- shouji --> 5 <meta http-equiv="Content-Type" content="text/html; charset=gbk" /> 6 <title>【華為榮耀8】榮耀8 4GB+64GB 全網通4G手機 魅海藍【行情 報價 價格 評測】-京東</title> 7 <meta name="keywords" content="HUAWEI榮耀8,華為榮耀8,華為榮耀8報價,HUAWEI榮耀8報價"/> 8 <meta name="description" content="【華為榮耀8】京東JD.COM提供華為榮耀8正品行貨,並包括HUAWEI榮耀8網購指南,以及華為榮耀8圖片、榮耀8參數、榮耀8評論、榮耀8心得、榮耀8技巧等信息,網購華為榮耀8上京東,放心又輕松" /> 9 <meta name="format-detection" content="telephone=no"> 10 <meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/2967929.html"> 11 <meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/2967929.html"> 12 <meta http-equiv="X-UA-Compatible" content="IE=Edge"> 13 <link rel="canonical" href="//item.jd.com/2967929.html"/> 14 <link rel="dns-prefetch" href="//misc.360buyimg.com"/> 15 <link rel="dns-prefetch" href="//static.360buyimg.com"/> 16 <link rel="dns-prefetch" href="//img10.360buyimg.com"/> 17 <link rel="dns
