jupyter基本使用

本文轉載自查看原文 2019-05-27 17:04 447 爬蟲

jupyter notebook的主要特點

編程時具有語法高亮、縮進、tab補全的功能。
可直接通過瀏覽器運行代碼，同時在代碼塊下方展示運行結果。
對代碼編寫說明文檔或語句時，支持Markdown語法。

啟動jupyter

jupyter notebook

快捷鍵

向上插入一個cell：a
向下插入一個cell：b
刪除cell：x
將code切換成markdown：m
將markdown切換成code：y
運行cell：shift+enter
查看幫助文檔：shift+tab
自動提示：tab

requests模塊

pip install requests

作用：就是用來模擬瀏覽器上網的。

特點：簡單，高效

old：urllib

requests模塊的使用流程：

指定url

發起請求

獲取響應數據

持久化存儲

#爬取搜狗首頁的頁面數據
import requests
#1指定url
url = 'https://www.sogou.com/'
#2.發起請求
response = requests.get(url=url)
#3獲取響應數據
page_text = response.text #text返回的是字符串類型的數據
#持久化存儲
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write(page_text)
print('over!')

處理get請求的參數
需求：網頁采集器
反爬機制：UA檢測
反反爬策略：UA偽裝

import requests
wd = input('enter a word:')
url = 'https://www.sogou.com/web'
#參數的封裝
param = {
    'query':wd
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.get(url=url,params=param,headers=headers)
#手動修改響應數據的編碼
response.encoding = 'utf-8'
page_text = response.text
fileName = wd + '.html'
with open(fileName,'w',encoding='utf-8') as fp:
    fp.write(page_text)
print(fileName,'爬取成功！！！')

#破解百度翻譯

url = 'https://fanyi.baidu.com/sug'
word = input('enter a English word:')
#請求參數的封裝
data = {
    'kw':word
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,data=data,headers=headers)
#text:字符串  json():對象
obj_json = response.json()

print(obj_json)

#爬取任意城市對應的肯德基餐廳的位置信息

#動態加載的數據
city = input('enter a cityName:')
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
data = {
    "cname": "",
    "pid": "",
    "keyword": city,
    "pageIndex": "2",
    "pageSize": "10",
}
#UA偽裝
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,headers=headers,data=data)

json_text = response.text

print(json_text)

爬取北京肯德基所有的餐廳位置信息（1-8頁）

http://www.kfc.com.cn/kfccda/storelist/index.aspx

import requests
city = input("enter a cityName:")
url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword"
for i in range(1,9):
    data = {
        "cname": "",
        "pid": "",
        "keyword": city,
        "pageIndex": i,
        "pageSize": "10",
    }
    #UA偽裝
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }
    response = requests.post(url=url,headers=headers,data=data)
    json_text = response.text
    print(json_text)

爬取豆瓣電影中更多的電影詳情數據

https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action=

import requests

for i in range(0,100):
    url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start={}&limit=20".format(i)

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }

    data = {
        "type": "5",
        "interval_id": "100:90",
        "action": "",
        "start": i,
        "limit": "20",
    }

    response = requests.get(url=url,data=data,headers=headers)
    json_text = response.json()
    print(json_text)

爬取每家企業的企業詳情數據

http://125.35.6.84:81/xk/

import requests

url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
for i in range(0,328):
    data = {
        "on": "true",
        "page": i,
        "pageSize": "15",
        "productName": "",
        "conditionType": "1",
        "applyname":"" ,
        "applysn":""
    }

    response = requests.post(url=url,data=data,headers=headers)
    json_text = response.json()
    for i in json_text["list"]:
        urls = "http://125.35.6.84:81/xk/itownet/portal/dzpz.jsp?id=" + i["ID"]
        datas = {
            "id": i["ID"]
        }
        responses = requests.post(url=urls,data=datas,headers=headers)
        urlss = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById"
        responsess = requests.post(url=urlss,data=datas,headers=headers)
        json_texts = responsess.json()
        print(json_texts)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 jupyter的安裝和使用 Jupyter Notebook的使用 Python的IDE之Jupyter的使用 jupyter中使用graphviz jupyter notebook 的使用 Jupyter/JupyterLab安裝使用 Jupyter Notebook使用 Jupyter Notebook 使用入門關於conda和jupyter使用 jupyter的 %timeit的使用