python使用關鍵字爬取url

本文轉載自查看原文 2018-10-21 17:54 2492 python/ 公子東/ Django框架/ Python

python網路爬蟲 --------- 使用百度輸入的關鍵字搜索內容然后爬取搜索內容的url

開發環境：windows7+python3.6.3

開發語言：Python

開發工具：pycharm

第三方軟件包：需安裝lxml4.0，如果只安裝lxml會出錯，要需要lxml中的etree

廢話不多說，貼上代碼：

爬取數據保存以TXT格式保存，等會嘗試使用Excel表格跟數據庫保存。

 1 import requests,time
 2 from lxml import etree
 3 
 4 
 5 def Redirect(url):
 6     try :
 7         res = requests.get(url,timeout=10)
 8         url = res.url
 9     except Exception as e:
10         print('4',e)
11         time.sleep(1)
12     return url
13 
14 def baidu_search(wd,pn_max,sav_file_name):
15     url = 'http://www.baidu.com/s'
16     return_set = set()
17 
18     for page in range(pn_max):
19         pn = page*10
20         querystring = {'wd':wd,'pn':pn}
21         headers = {
22             'pragma':'no-cache',
23             'accept-encoding': 'gzip,deflate,br',
24             'accept-language' : 'zh-CN,zh;q=0.8',
25             'upgrade-insecure-requests' : '1',
26             'user-agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0",
27             'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
28             'cache-control': "no-cache",
29             'connection': "keep-alive",
30         }
31 
32         try :
33             response = requests.request('GET',url,headers=headers,params=querystring)
34             print('!!!!!!!!!!!!!!',response.url)
35             selector = etree.HTML(response.text,parser = etree.HTMLParser(encoding='utf-8'))
36         except Exception as e:
37             print('頁面加載失敗',e)
38             continue
39         with open(sav_file_name,'a+') as f:
40             for i in range(1,10):
41                 try :
42                     context = selector.xpath('//*[@id="'+str(pn+i)+'"]/h3/a[1]/@href')
43                     print(len(context),context[0])
44                     i = Redirect(context[0])
45                     print('context='+context[0])
46                     print ('i='+i)
47                     f.write(i)
48                     f.write('\n')
49                     break
50                     return_set.add(i)
51                     f.write('\n')
52                 except Exception as e:
53                     print(i,return_set)
54                     print('3',e)
55 
56     return return_set
57 
58 if __name__ == '__main__':
59     wd = '網絡貸款'
60     pn = 100
61     save_file_name = 'save_url_soup.txt'
62     return_set = baidu_search(wd,pn,save_file_name)

View Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 簡單實現淘寶關鍵字商品爬取【Python網絡爬蟲四】通過關鍵字爬取多張百度圖片的圖片 Python3 根據關鍵字爬取百度圖片 Python爬蟲：通過關鍵字爬取百度圖片 Python：輸入關鍵字進行百度搜索並爬取搜索結果爬取微博文章內容，關鍵字搜索爬取 Python爬蟲爬取ECVA論文標題作者摘要關鍵字等信息並存儲到mysql數據庫高德3地圖之python爬取POI數據及其邊界經緯度(根據關鍵字在城市范圍內搜索) python爬蟲: 指定關鍵字爬取圖片 Python之end關鍵字使用