jupyter基本使用

本文转载自查看原文 2019-05-27 17:04 447 爬虫

jupyter notebook的主要特点

编程时具有语法高亮、缩进、tab补全的功能。
可直接通过浏览器运行代码，同时在代码块下方展示运行结果。
对代码编写说明文档或语句时，支持Markdown语法。

启动jupyter

jupyter notebook

快捷键

向上插入一个cell：a
向下插入一个cell：b
删除cell：x
将code切换成markdown：m
将markdown切换成code：y
运行cell：shift+enter
查看帮助文档：shift+tab
自动提示：tab

requests模块

pip install requests

作用：就是用来模拟浏览器上网的。

特点：简单，高效

old：urllib

requests模块的使用流程：

指定url

发起请求

获取响应数据

持久化存储

#爬取搜狗首页的页面数据
import requests
#1指定url
url = 'https://www.sogou.com/'
#2.发起请求
response = requests.get(url=url)
#3获取响应数据
page_text = response.text #text返回的是字符串类型的数据
#持久化存储
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write(page_text)
print('over!')

处理get请求的参数
需求：网页采集器
反爬机制：UA检测
反反爬策略：UA伪装

import requests
wd = input('enter a word:')
url = 'https://www.sogou.com/web'
#参数的封装
param = {
    'query':wd
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.get(url=url,params=param,headers=headers)
#手动修改响应数据的编码
response.encoding = 'utf-8'
page_text = response.text
fileName = wd + '.html'
with open(fileName,'w',encoding='utf-8') as fp:
    fp.write(page_text)
print(fileName,'爬取成功！！！')

#破解百度翻译

url = 'https://fanyi.baidu.com/sug'
word = input('enter a English word:')
#请求参数的封装
data = {
    'kw':word
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,data=data,headers=headers)
#text:字符串  json():对象
obj_json = response.json()

print(obj_json)

#爬取任意城市对应的肯德基餐厅的位置信息

#动态加载的数据
city = input('enter a cityName:')
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
data = {
    "cname": "",
    "pid": "",
    "keyword": city,
    "pageIndex": "2",
    "pageSize": "10",
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,headers=headers,data=data)

json_text = response.text

print(json_text)

爬取北京肯德基所有的餐厅位置信息（1-8页）

http://www.kfc.com.cn/kfccda/storelist/index.aspx

import requests
city = input("enter a cityName:")
url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword"
for i in range(1,9):
    data = {
        "cname": "",
        "pid": "",
        "keyword": city,
        "pageIndex": i,
        "pageSize": "10",
    }
    #UA伪装
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }
    response = requests.post(url=url,headers=headers,data=data)
    json_text = response.text
    print(json_text)

爬取豆瓣电影中更多的电影详情数据

https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action=

import requests

for i in range(0,100):
    url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start={}&limit=20".format(i)

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }

    data = {
        "type": "5",
        "interval_id": "100:90",
        "action": "",
        "start": i,
        "limit": "20",
    }

    response = requests.get(url=url,data=data,headers=headers)
    json_text = response.json()
    print(json_text)

爬取每家企业的企业详情数据

http://125.35.6.84:81/xk/

import requests

url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
for i in range(0,328):
    data = {
        "on": "true",
        "page": i,
        "pageSize": "15",
        "productName": "",
        "conditionType": "1",
        "applyname":"" ,
        "applysn":""
    }

    response = requests.post(url=url,data=data,headers=headers)
    json_text = response.json()
    for i in json_text["list"]:
        urls = "http://125.35.6.84:81/xk/itownet/portal/dzpz.jsp?id=" + i["ID"]
        datas = {
            "id": i["ID"]
        }
        responses = requests.post(url=urls,data=datas,headers=headers)
        urlss = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById"
        responsess = requests.post(url=urlss,data=datas,headers=headers)
        json_texts = responsess.json()
        print(json_texts)

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 jupyter的 %timeit的使用 Python的IDE之Jupyter的使用 Jupyter的安装和基本使用 python安装和jupyter的使用 jupyter如何安装和使用 Jupyter Notebook使用教程 Jupyter notebooks 安装和使用 jupyter lab 的基本使用 jupyter通过notedown使用markdown Jupyter简单使用