jupyter基本使用


Jupyter Notebook官方介紹

jupyter notebook的主要特點

編程時具有語法高亮、縮進、tab補全的功能。
可直接通過瀏覽器運行代碼,同時在代碼塊下方展示運行結果。
對代碼編寫說明文檔或語句時,支持Markdown語法。

啟動jupyter

jupyter notebook

快捷鍵

  1. 向上插入一個cell:a
  2. 向下插入一個cell:b
  3. 刪除cell:x
  4. 將code切換成markdown:m
  5. 將markdown切換成code:y
  6. 運行cell:shift+enter
  7. 查看幫助文檔:shift+tab
  8. 自動提示:tab

requests模塊

pip install requests

作用:就是用來模擬瀏覽器上網的。

特點:簡單,高效

old:urllib

requests模塊的使用流程:

      指定url

      發起請求

     獲取響應數據

     持久化存儲

#爬取搜狗首頁的頁面數據
import requests
#1指定url
url = 'https://www.sogou.com/'
#2.發起請求
response = requests.get(url=url)
#3獲取響應數據
page_text = response.text #text返回的是字符串類型的數據
#持久化存儲
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write(page_text)
print('over!')
  • 處理get請求的參數
  • 需求:網頁采集器
  • 反爬機制:UA檢測
  • 反反爬策略:UA偽裝
import requests
wd = input('enter a word:')
url = 'https://www.sogou.com/web'
#參數的封裝
param = {
    'query':wd
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.get(url=url,params=param,headers=headers)
#手動修改響應數據的編碼
response.encoding = 'utf-8'
page_text = response.text
fileName = wd + '.html'
with open(fileName,'w',encoding='utf-8') as fp:
    fp.write(page_text)
print(fileName,'爬取成功!!!')

#破解百度翻譯

url = 'https://fanyi.baidu.com/sug'
word = input('enter a English word:')
#請求參數的封裝
data = {
    'kw':word
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,data=data,headers=headers)
#text:字符串  json():對象
obj_json = response.json()

print(obj_json)

#爬取任意城市對應的肯德基餐廳的位置信息

#動態加載的數據
city = input('enter a cityName:')
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
data = {
    "cname": "",
    "pid": "",
    "keyword": city,
    "pageIndex": "2",
    "pageSize": "10",
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,headers=headers,data=data)

json_text = response.text

print(json_text)

 爬取北京肯德基所有的餐廳位置信息(1-8頁)

http://www.kfc.com.cn/kfccda/storelist/index.aspx

import requests
city = input("enter a cityName:")
url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword"
for i in range(1,9):
    data = {
        "cname": "",
        "pid": "",
        "keyword": city,
        "pageIndex": i,
        "pageSize": "10",
    }
    #UA偽裝
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }
    response = requests.post(url=url,headers=headers,data=data)
    json_text = response.text
    print(json_text)

爬取豆瓣電影中更多的電影詳情數據

https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action=

import requests

for i in range(0,100):
    url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start={}&limit=20".format(i)

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }

    data = {
        "type": "5",
        "interval_id": "100:90",
        "action": "",
        "start": i,
        "limit": "20",
    }

    response = requests.get(url=url,data=data,headers=headers)
    json_text = response.json()
    print(json_text)

爬取每家企業的企業詳情數據

http://125.35.6.84:81/xk/

import requests

url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
for i in range(0,328):
    data = {
        "on": "true",
        "page": i,
        "pageSize": "15",
        "productName": "",
        "conditionType": "1",
        "applyname":"" ,
        "applysn":""
    }

    response = requests.post(url=url,data=data,headers=headers)
    json_text = response.json()
    for i in json_text["list"]:
        urls = "http://125.35.6.84:81/xk/itownet/portal/dzpz.jsp?id=" + i["ID"]
        datas = {
            "id": i["ID"]
        }
        responses = requests.post(url=urls,data=datas,headers=headers)
        urlss = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById"
        responsess = requests.post(url=urlss,data=datas,headers=headers)
        json_texts = responsess.json()
        print(json_texts)

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM