jupyter基本使用


Jupyter Notebook官方介绍

jupyter notebook的主要特点

编程时具有语法高亮、缩进、tab补全的功能。
可直接通过浏览器运行代码,同时在代码块下方展示运行结果。
对代码编写说明文档或语句时,支持Markdown语法。

启动jupyter

jupyter notebook

快捷键

  1. 向上插入一个cell:a
  2. 向下插入一个cell:b
  3. 删除cell:x
  4. 将code切换成markdown:m
  5. 将markdown切换成code:y
  6. 运行cell:shift+enter
  7. 查看帮助文档:shift+tab
  8. 自动提示:tab

requests模块

pip install requests

作用:就是用来模拟浏览器上网的。

特点:简单,高效

old:urllib

requests模块的使用流程:

      指定url

      发起请求

     获取响应数据

     持久化存储

#爬取搜狗首页的页面数据
import requests
#1指定url
url = 'https://www.sogou.com/'
#2.发起请求
response = requests.get(url=url)
#3获取响应数据
page_text = response.text #text返回的是字符串类型的数据
#持久化存储
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write(page_text)
print('over!')
  • 处理get请求的参数
  • 需求:网页采集器
  • 反爬机制:UA检测
  • 反反爬策略:UA伪装
import requests
wd = input('enter a word:')
url = 'https://www.sogou.com/web'
#参数的封装
param = {
    'query':wd
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.get(url=url,params=param,headers=headers)
#手动修改响应数据的编码
response.encoding = 'utf-8'
page_text = response.text
fileName = wd + '.html'
with open(fileName,'w',encoding='utf-8') as fp:
    fp.write(page_text)
print(fileName,'爬取成功!!!')

#破解百度翻译

url = 'https://fanyi.baidu.com/sug'
word = input('enter a English word:')
#请求参数的封装
data = {
    'kw':word
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,data=data,headers=headers)
#text:字符串  json():对象
obj_json = response.json()

print(obj_json)

#爬取任意城市对应的肯德基餐厅的位置信息

#动态加载的数据
city = input('enter a cityName:')
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
data = {
    "cname": "",
    "pid": "",
    "keyword": city,
    "pageIndex": "2",
    "pageSize": "10",
}
#UA伪装
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
response = requests.post(url=url,headers=headers,data=data)

json_text = response.text

print(json_text)

 爬取北京肯德基所有的餐厅位置信息(1-8页)

http://www.kfc.com.cn/kfccda/storelist/index.aspx

import requests
city = input("enter a cityName:")
url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword"
for i in range(1,9):
    data = {
        "cname": "",
        "pid": "",
        "keyword": city,
        "pageIndex": i,
        "pageSize": "10",
    }
    #UA伪装
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }
    response = requests.post(url=url,headers=headers,data=data)
    json_text = response.text
    print(json_text)

爬取豆瓣电影中更多的电影详情数据

https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action=

import requests

for i in range(0,100):
    url = "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start={}&limit=20".format(i)

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }

    data = {
        "type": "5",
        "interval_id": "100:90",
        "action": "",
        "start": i,
        "limit": "20",
    }

    response = requests.get(url=url,data=data,headers=headers)
    json_text = response.json()
    print(json_text)

爬取每家企业的企业详情数据

http://125.35.6.84:81/xk/

import requests

url = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
for i in range(0,328):
    data = {
        "on": "true",
        "page": i,
        "pageSize": "15",
        "productName": "",
        "conditionType": "1",
        "applyname":"" ,
        "applysn":""
    }

    response = requests.post(url=url,data=data,headers=headers)
    json_text = response.json()
    for i in json_text["list"]:
        urls = "http://125.35.6.84:81/xk/itownet/portal/dzpz.jsp?id=" + i["ID"]
        datas = {
            "id": i["ID"]
        }
        responses = requests.post(url=urls,data=datas,headers=headers)
        urlss = "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById"
        responsess = requests.post(url=urlss,data=datas,headers=headers)
        json_texts = responsess.json()
        print(json_texts)

 

 

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM