jupyter notebook的主要特點
編程時具有語法高亮、縮進、tab補全的功能。
可直接通過瀏覽器運行代碼,同時在代碼塊下方展示運行結果。
對代碼編寫說明文檔或語句時,支持Markdown語法。
啟動jupyter
jupyter notebook
快捷鍵
- 向上插入一個cell:a
- 向下插入一個cell:b
- 刪除cell:x
- 將code切換成markdown:m
- 將markdown切換成code:y
- 運行cell:shift+enter
- 查看幫助文檔:shift+tab
- 自動提示:tab
requests模塊
pip install requests
作用:就是用來模擬瀏覽器上網的。
特點:簡單,高效
old:urllib
requests模塊的使用流程:
指定url
發起請求
獲取響應數據
持久化存儲
#爬取搜狗首頁的頁面數據 import requests #1指定url url = 'https://www.sogou.com/' #2.發起請求 response = requests.get(url=url) #3獲取響應數據 page_text = response.text #text返回的是字符串類型的數據 #持久化存儲 with open('./sogou.html','w',encoding='utf-8') as fp: fp.write(page_text) print('over!')
- 處理get請求的參數
- 需求:網頁采集器
- 反爬機制:UA檢測
- 反反爬策略:UA偽裝
import requests wd = input('enter a word:') url = 'https://www.sogou.com/web' #參數的封裝 param = { 'query':wd } #UA偽裝 headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.get(url=url,params=param,headers=headers) #手動修改響應數據的編碼 response.encoding = 'utf-8' page_text = response.text fileName = wd + '.html' with open(fileName,'w',encoding='utf-8') as fp: fp.write(page_text) print(fileName,'爬取成功!!!')
#破解百度翻譯
url = 'https://fanyi.baidu.com/sug' word = input('enter a English word:') #請求參數的封裝 data = { 'kw':word } #UA偽裝 headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.post(url=url,data=data,headers=headers) #text:字符串 json():對象 obj_json = response.json() print(obj_json)
#爬取任意城市對應的肯德基餐廳的位置信息
#動態加載的數據 city = input('enter a cityName:') url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword' data = { "cname": "", "pid": "", "keyword": city, "pageIndex": "2", "pageSize": "10", } #UA偽裝 headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.post(url=url,headers=headers,data=data) json_text = response.text print(json_text)
爬取北京肯德基所有的餐廳位置信息(1-8頁)
http://www.kfc.com.cn/kfccda/storelist/index.aspx
import requests city = input("enter a cityName:") url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword" for i in range(1,9): data = { "cname": "", "pid": "", "keyword": city, "pageIndex": i, "pageSize": "10", } #UA偽裝 headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" } response = requests.post(url=url,headers=headers,data=data) json_text = response.text print(json_text)
爬取豆瓣電影中更多的電影詳情數據
https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action=
import requests for i in range(0,100): url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start={}&limit=20".format(i) headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" } data = { "type": "5", "interval_id": "100:90", "action": "", "start": i, "limit": "20", } response = requests.get(url=url,data=data,headers=headers) json_text = response.json() print(json_text)
import requests url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" } for i in range(0,328): data = { "on": "true", "page": i, "pageSize": "15", "productName": "", "conditionType": "1", "applyname":"" , "applysn":"" } response = requests.post(url=url,data=data,headers=headers) json_text = response.json() for i in json_text["list"]: urls = "http://125.35.6.84:81/xk/itownet/portal/dzpz.jsp?id=" + i["ID"] datas = { "id": i["ID"] } responses = requests.post(url=urls,data=datas,headers=headers) urlss = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById" responsess = requests.post(url=urlss,data=datas,headers=headers) json_texts = responsess.json() print(json_texts)