使用requests模塊簡單獲取數據

本文轉載自查看原文 2019-02-26 21:19 732 爬蟲

一、使用ruquests的基本步驟：

指定url
發起請求
獲取響應對象中的數據
持久化存儲

1 #1
2 url = 'https://www.sogou.com/'
3 #2.
4 response = requests.get(url=url)
5 #3.
6 page_text = response.text
7 #4.
8 with open('./sogou.html','w',encoding='utf-8') as fp:
9     fp.write(page_text)

二、爬取搜狗指定搜索

 1 import requests
 2 url = "'https://www.sogou.com/web"
 3 wd = input("請輸入搜索關鍵字")
 4 param = {
 5     'query':wd
 6 }
 7 
 8 response = requests.get(url=url,params=param).content
 9 filename = wd+'.html'
10 with open(filename,'w',encoding='utf8') as f1:
11         f1.write(response)

三、Ajax請求

通過抓包，獲取請求攜帶的參數，

例如獲取分頁顯示的數據，當點擊下一頁時，發送ajax請求，對此時的url請求可以動，這里我們定義好請求參數param，動態的指定頁碼和每頁顯示的數據，通過ajax請求，返回一組json數據

存儲每頁獲取的數據的id，編輯new_url，獲取詳情的信息

 1 import requests
 2 url = 'http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList'
 3 headers = {
 4     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
 5 }
 6 param = {
 7    "on":"true",
 8     "page":1,
 9     "pageSize":"15",
10     "productName":"",
11     "conditionType":"1",
12     "applyname":"",
13     "applysn":"",
14 }
15 id_list = []
16 json_object = requests.post(url=url,headers=headers,params=param).json()
17 print(json_object['list'])
18 for i in json_object['list']:
19     id_list.append(i['ID'])
20     
21 new_url = 'http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById'
22 filename = 'yaojians.text'
23 f = open(filename,'w',encoding='utf8')
24 for id in id_list:
25     param = {
26         'id':id
27     }
28     content_json = requests.post(url=new_url,params=param,headers=headers).json()
29     f.write(str(content_json)+'\n')
30     
31     
32

四、使用BeautifullSoup爬取數據

bs4解析：
　　pip install bs4
　　pip install lxml

解析原理
　　1、將要進行解析的源碼加載到bs對象

　　2、調用bs對象中相關的方法或屬性進行源碼中的相關標簽的定位

　　3、將定位到的標簽之間存在的文本或屬性值獲取

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 url = 'http://www.shicimingju.com/book/sanguoyanyi.html'
 5 headers = {
 6     "User-Agent":'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
 7 }
 8 res = requests.get(url=url, headers=headers).text
 9 soup = BeautifulSoup(res, 'lxml')
10 a_tags_list = soup.select('.book-mulu > ul > li > a')
11 filename = 'snaguo.text'
12 fp = open(filename, 'w', encoding='utf-8')
13 for a_tag in a_list:
14     title = a_tag.string
15     detail_url = "http://www.shicimingju.com"+a_tag["href"] 
16     detail_content = requests.get(url=detail_url, headers=headers).text
17     soup = BeautifulSoup(detail_content, "lxml")
18     detail_text = soup.find('div', class_="chapter_content").text
19     fp.write(title+'\n'+detail_text)
20     print(title, '下載完畢')
21 print('over')
22 fp.close()
23

五、簡單使正則爬取圖片

 1 url = 'https://www.qiushibaike.com/pic/page/%d/?s=5170552'
 2 start_page = int(input("請輸入起始頁:"))
 3 end_page = int(input("請輸入結束頁:"))
 4 headers = {
 5     "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0"
 6 }
 7 for page in range(start_page,end_page+1):
 8     new_url = format(url%page)
 9     response = requests.get(url=new_url, headers=headers).text
10     # 每一頁的圖片url
11     images_url = re.findall('<div class="thumb">.*?<img src="(.*?)" alt=.*?</div>',response,re.S)
12 os.mkdir('qiutu')
13     
14     for image_url in images_url:
15         detail_url = 'http:'+image_url
16         # 獲取到當前圖片的二進制流
17         content = requests.get(url=detail_url,headers=headers).content
18         # 切割 把圖片路徑最后的字符作為圖片名
19         image_name = image_url.split('/')[-1]
20         with open('./qiutu/'+image_name,'wb')as f1:
21             f1.write(content)
22 print('over')

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python-requests模塊-獲取cookie以及使用cookie Requests(4)：Requests模塊獲取響應內容 requests和BeautifulSoup模塊的使用 python爬蟲筆記（1-1）requests模塊：請求數據獲取響應內容 python---requests和beautifulsoup4模塊的使用 requests模塊使用requests簡單的頁面爬取 requests獲取cookie和使用cookie的方法使用requests獲取圖片並保存 python爬蟲之網頁的獲取requests的使用