Python中requests基本使用


requests模塊

  • 介紹
  • 基於GET請求
  • 基於POST請求
  • 響應Response
  • 高級用法

 requests介紹

 官方鏈接   --->   http://docs.python-requests.org/en/master/

 HTTP協議 --->  http://www.cnblogs.com/linhaifeng/p/6266327.html

 安裝 : pip3 install requests

介紹:使用requests可以模擬瀏覽器請求
requests庫發送請求將網頁內容下載下來后,並不會執行js代碼,需要我們自己分析目標站點然后發起新的request請求 
各種請求方式:常用requests.get()和requests.post()
import requests
r = requests.get('https://api.github.com/events')
r1 = requests.get('http://httpbin.org/post',data={'key':'value'})

二、基GET請求

基本參數 :method,url , params , data , json , headers , cookies

其它參數 :files , auth , proxies...

1、基於GET請求(無參數)

import requests
response = requests.get('http://dig.chouti.com/')
print(response)
print(response.text)
print(response.url)

 2、帶參數的GET請求-->params

#請求頭內將自己偽裝成瀏覽器,否則百度不會正常返回頁面內容
import requests
#在headers請求頭中將自己偽裝成瀏覽器,否則百度不會正常返回頁面內容;
#將參數直接寫到請求url的后面,只能用於英文和數字,中文或者其它特殊字符必須使用params參數帶值過去;
response = requests.get('https://www.baidu.com/s?wd=python&pn=1',
           headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', })
print(response.text)
#如果出現關鍵詞是中文或者其它特殊字符,即通過params帶參數
wd = 'alck老師'
pn = 1
response_2 = requests.get('https://www.baidu.com/s', params={'wd':wd, 'pn':pn },
         headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', })
print(response_2.text)
print(response_2.url)

 #有參數例子

import requests
payload  =   {'key1':'value1','key2':'value2'}
ret = requests.get("http://www.itest.info",params=payload)
print(ret.url) 
#輸出:http://www.itest.info/?key2=value2&key1=value1

3、帶參數的GET請求-->headers

通常在發送請求時都需要帶上請求頭,請求頭是將自身偽裝成瀏覽器的關鍵,常見的有用請求頭:
Host
Referer # 大型網站通常都會根據該參數判斷請求的來源
User-Agent # 客戶端
Cookie   # Cookie信息雖然包含在請求頭里,但requests模塊有單獨的參數來處理它,headers={}內就不需要放它
#添加headers(瀏覽器會識別請求頭,不加可能會被拒絕訪問,例如:https://www.zhihu.com/explor)
import requests
response = requests.get('https://www.zhihu.com/explore')
print(response.status_code)
# 返回500
#自定制headers
headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',}
response = requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.status_code)
#200

 4、帶參數的GET請求-->cookies

import requests
#登錄github,然后從瀏覽器中獲取cookies,以后就可以直接拿着cookie登錄了,無需輸入用戶名;
Cookies  = { 'user_session':'ImRJpK-svLGo2riFpzGKXHdkOCnvnkuFG7CySWGYljuGP--a',}
response = requests.get('https://github.com/settings/emails',cookies=Cookies)
print('352932341@qq.com'in response.text)#True

 三、基於POST請求

 1、介紹

#GET請求: HTTP默認的請求方法就是GET,特點如下:   沒有請求體 ;   數據必須在1K之內;   GET請求數據會暴露在瀏覽器的地址欄中;
#GET請求常用的操作:   在瀏覽器的地址欄中直接給出URL,那么一定就是GET請求;   點擊頁面上的超鏈接也一定是GET請求;   提交表單時,表單默認使用GET請求,但可以設置為POST; #POST請求   數據不會出現在地址欄中;    數據的大小沒有上限;   有請求體;    請求體中如果存在中文,會使用URL編碼; requests.post()用法與requests.get()完全一致,特殊的是requests.post()有一個data參數,用來存放請求體數據。

 2、發送post請求,模擬瀏覽器的登錄行為

      對於登錄來說,應該輸錯用戶名和密碼然后分析抓包流程,因為輸入正確瀏覽器就跳轉了無法分析;

import requests
import json
#1、基本POST實例
url  = 'https://api.github.com/some/endpoint'
payload = {'key1':'value1','key2':'value2'}
ret  = requests.post(url=url,data = payload )
print(ret.text)
2、發送請求頭和數據
payload = {'some':'data'}
headers = {'content-type':'application/json'}
r2 = requests.post( url = "http://www.oldboyedu.com',data = json.dumps(payload),headers = headers )
print(ret.text)
print(ret.cookies)
ps:請求參數中存在字典嵌套字典時用json
       其它參數使用data和json都可以

其它請求

requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
# 以上方法均是在此方法的基礎上構建
requests.request(method, url, **kwargs)

更多參數

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.
    :param method:         method for the new :class:`Request` object.
    :param url:    URL for the new :class:`Request` object
    :param params:  (optional可選的) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data:  (optional可選的) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json:   (optional) json data to send in the body of the :class:`Request`.
    :param headers:  (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies:  (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files:  (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth:           (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout:      (optional) How long to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout:        float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies:  (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
    :param stream:  (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    Usage::
      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      <Response [200]>
    """

參數例子:

偽代碼

import requests,json
def param_method_url():
    """下面這種寫法和requests.get()和requests.post()是一樣的功能"""
    # response =  requests.request(method='get',url='https://github.com')
    # response =  requests.request(method='post',url='https://github.com')
    pass
def param_param():
    '''參數params以及傳遞幾種方式'''
    # 字典,字符串,字節(ascii編碼已內)
    requests.request(method='get', url='https://github.com',params={'k1':'v1','k2':'聖劍'})
    requests.request(method='get',url='https://github.com',params="k1=v1&k2=聖劍&k3=v3&k3==vv3")
    requests.request(method='get', url='https://github.com', params=bytes("k1=v1&k2=k2&k3=v3&k3==vv3",encoding='utf8'))
    #錯誤,參數為中文
    requests.request(method='get',url='https://github.com',params=bytes('k1=v1&k2=屌爆了&k3=vv3',encoding='utf8'))
def param_data():
    '''參數data,傳遞方式'''
    #字典、字符串、字節、文件對象
    requests.request(method='get',url = 'https://github.com',data  = {'k1':'v1','k2':'交通費'})
    requests.request(method='POST',url = 'https://github.com',data = "k1=v1;k2=v2;k3=v3;k3=v4")
    requests.request(method ='POST',url = 'https://github.com',data = "k1=v1;k2=v2;k3=v3;k3=v4",headers = {'Content-Type':'application/x-www-form-urlencoded'})
    requests.request(method ='POST',url ='https://github.com',data  = open('data_file.py',mode='r',encoding='utf-8'),#文件內容是:k1=v1;k2=v2;k3=v3;k3=v4
                                      headers ={'Content-Type':'application/x-www-form-urlencoded'})
def param_json():
    # 將json中對應的數據進行序列化成一個字符串,json.dumps(...)
    # 然后發送到服務端的body中,並且Content-Type是{'Content-Type':'application/json'}
    requests.request(method='POST',url='https://github.com', json={'k1':'v1','k2':'交通費'})
def param_headers():
    #發送請求頭到服務端
    requests.request(method='POST',url    ='https://github.com',json   ={'k1':'v1','k2':'交通費'},headers={'Content-Type':'application/x-www-form-urlencoded'} )
def param_cookies():
    #發送Cookie到服務端
    requests.request(method='POST',url ='https://github.com',data={'k1':'v1','k2':'v2'},cookies ={'cook1':'value1'})
def param_files():
    #發送文件
    file_dict={'f1':open('readme','rb')}
    requests.request(method='POST',url='https://github.com',files=file_dict)
    #發送文件,定制文件名
    file_dict_2 ={'f2':('test.txt',open('readme','rb'))}
    requests.request(method='POST', url='https://github.com',files=file_dict_2)
    #發送文件,定制文件名
    file_dict_3 = { 'f3':('test.txt','wordcontent','application/text',{'k1':'0'}) }
    requests.request(method='POST',url='https://github.com', files=file_dict_3)
def param_auth():
    from requests.auth import HTTPBasicAuth,HTTPDigestAuth
    ret = requests.get('https://api.github.com/usre',auth=HTTPBasicAuth('test','123456t'))
    print(ret.text)
    ret_one = requests.get('http://192.168.1.1', auth=HTTPBasicAuth('admin','admin'))
    ret_one.encoding ='gbk'
    print(ret_one.text)
    ret_tow = requests.get('http://httpbin.org/digest-auth/auth/user/pass',auth=HTTPDigestAuth('user','pass'))
    print(ret)
def param_timeout():
    ret = requests.get('http://google.com/',timeout=1)
    print(ret)
    ret_1 =requests.get('http://google.com/',timeout=(5,1))
    print(ret_1)
def param_allow_redirects():
    '''允許重定向'''
    ret = requests.get('http://127.0.0.1:8000/test',allow_redirects=False)
    print(ret.text)
def param_proxies():
    '''代理'''
    # proxies = {
    #     "http":"62.172.258.98:80",
    #     "https":"http://61.185.219.126:3128",
    # }
    proxies = {'http://10.20.1.128':'http://10.10.1.10:5323'}
    ret  = requests.get("http://www.proxy360.cn/Proxy",proxies=proxies)
    print(ret.headers)
    from requests.auth import HTTPProxyAuth
    proxyDict = { 'http':'77.75.105.165','https':'77.75.105.165'}
    auth = HTTPProxyAuth('username','mypassword')
    r  = requests.get("http://www.google.com",proxies=proxyDict,auth=auth)
    print(r.text)
def param_stream():
    ret  = requests.get('http://127.0.0.1:800/test',stream=True)
    print(ret.content)
    ret.close()
def requests_session():
    import requests
    session = requests.Session()
    ### 1、登錄任何頁面,獲取cookie
    i1  = session.get(url="http://dig.chouti.com/help/service")
    ###2、用戶登錄,攜帶上一次的cookie,后台對cookie中的gpsd進行授權
    i2 = session.post(
        url="http://dig.chouti.com/login",
        data ={
            'phone'   :'86352932341@qq.com',
            'password':'xxxooo',
            'oneMonth':"" })
    i3 = session.post( url="http://dig.chouti.com/link/vote?linksId=8589623",  )
    print(i3.text)

實戰:

2.1、目標站點分析

瀏覽器輸入:https://github.com/login ---->然后輸入錯誤的賬號密碼分析如下:

2.2、流程分析

 先GET :https://github.com/login       拿到初始cookie與authenticity_token
 返回POST:https://github.com/session, 帶上初始cookie,帶上請求體(authenticity_token,用戶名,密碼等);
 最后拿到登錄cookie
 PS:如果密碼時密文形式,則可以先輸錯賬號,輸對密碼,然后到瀏覽器中拿到加密后的密碼,github的密碼是明文;

import requests,re
#第一次請求
r1   = requests.get('https://github.com/login')
r1_cookie = r1.cookies.get_dict()#拿到初始cookie(未被授權)]
#<input type="hidden" name="authenticity_token" value="OquWGlzANjzFvVWfygbs94KI15FeI42bfNy1eQkLBp76xpFtQ/cJEYUlQNvdT3xTCkOL1IkMDor9JjhZYV+VRg==" />      <div class="auth-form-header p-0">
authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #正則匹配上面的返回的input標簽,獲取csrf的value值
#獲取到csrf token
#第二次請求:帶着初始cookie和token發送POST請求給登錄頁面,帶上賬號密碼
data = {
      'commit':'Sign in',
      'utf8':'✓',
      'authenticity_token':authenticity_token,
      'login':'352932341@qq.com',
      'password':'xxoooaooa' #密碼是錯誤,需輸入自己密碼 } r2 = requests.post('https://github.com/session',data=data,cookies = r1_cookie) login_cookie = r2.cookies.get_dict() #第三次請求:以后登錄,拿着login_cookie訪問一些個人配置:emails r3 = requests.get('https://github.com/settings/emails',cookies=login_cookie) print('352932341@qq.com' in r3.text) #True 

 requests.session()自動保存cookie信息

import requests,re
#第一次請求
session   = requests.session() #session會自動幫我們保存cookie信息
r1   = session.get('https://github.com/login')
authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0]
#從頁面中拿到CSRF TOKEN
#第二次請求:帶着初始cookie和token發送POST請求給登錄頁面,帶上賬號密碼
data = {
      'commit':'Sign in',
      'utf8':'✓',
      'authenticity_token':authenticity_token,
      'login':'352932341@qq.com',
      'password':'123456tian'
}
r2  = session.post('https://github.com/session', data=data,)
#第三次請求:以后登錄,拿着login_cookie訪問一些個人配置:emails
r3 = session.get('https://github.com/settings/emails')
print('352932341@qq.com' in r3.text)
#True

 補充

requests.post(url='xxxxxxxx',data={'xxx':'yyy'}) #沒有指定請求頭,#默認的請求頭:application/x-www-form-urlencoed
#如果我們自定義請求頭是application/json,並且用data傳值, 則服務端取不到值
requests.post(url='',data={'':1,},headers={'content-type':'application/json' })
requests.post(url='', json={'':1,}, ) #默認的請求頭:application/json

抽屜首頁全部點贊

import requests
from lxml import etree
header ={ "user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",}
r1 = requests.get("https://dig.chouti.com/",headers=header)
payload = { "phone"   :"8613580423620","password":"123456tian","oneMonth":"1",}
r2    = requests.post("https://dig.chouti.com/login",headers=header,cookies=r1.cookies.get_dict(),data=payload)
gpsd  = r1.cookies.get_dict()['gpsd']
print(r2.text)
html  = etree.HTML(r1.text)
tages = html.xpath(".//div[@id='content-list']/div/div[2]/img/@lang")
print(tages)
print(len(tages))
for i in tages:
    url  = "https://dig.chouti.com/link/vote?linksId={0}".format(i)
    r3   = requests.post(url=url,headers=header,cookies={'gpsd':gpsd})

汽車之家保存首頁標題中圖片保存到本地

import requests
from uuid import uuid4
from lxml import etree
header ={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",}
r1 = requests.get("https://www.autohome.com.cn/news/",headers=header)
print(r1)
r1.encoding = 'gbk'
html  = etree.HTML(r1.text)
tages = html.xpath(".//div[@id='auto-channel-lazyload-article']/ul/li/a/div[1]/img/@src")
for i in tages:
    img_url = "http:{0}".format(i)
    res_jpg = requests.get(img_url,headers=header)
    file_name = str(uuid4())+".jpg"
    with open(file_name,'wb') as wr:
        wr.write(res_jpg.content)

Response

3.1、response屬性

import requests
response = requests.get('http://www.jianshu.com') #默認返回響應碼
#response屬性
print(response.text)       #str類型打印請求響應體
print(response.content) #字節方式顯示請求響應體
print(response.status_code) #獲取返回狀態碼
print(response.headers)
print(response.cookies) #獲取所有的cookies 返回字典類型
print(response.cookies.get_dict()) #獲取指定的cookies
print(response.cookies.items())
response.encoding = "gbk" #
#第二種方法,解決編碼不一致問題
print(response.url) print(response.history) print(response.encoding)

 3.2、編碼問題

import requests
response = requests.get('http://www.autohome.com/news')
response.encoding ='gbk' #汽車之家返回頁面內容為gb2312,而requests的默認編碼為ISO-8859-1,如果不設置中文就是亂碼
print(response.text)

 3.3 獲取二進制數據

需求:請求圖片ulr,保存到本地

import requests
response = requests.get('https://timgsa.baidu.com/timg?image&quality=80
&size=b9999_10000&sec=1509868306530&di=712e4ef3ab258b36e9f4b48e85a81c9d&
imgtype=0&src=http%3A%2F%2Fc.hiphotos.baidu.com%
2Fimage%2Fpic%2Fitem%2F11385343fbf2b211e1fb58a1c08065380dd78e0c.jpg') with open('a.jpg','wb') as f: f.write(response.content)

 需求:stream參數:一點一點獲取,

 例如:下載視頻,50G,用response.content然后一下子寫到文件中是不合理

import requests
response = requests.get('https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4',
                        stream=True) #stream參數:一點一點取
with open('b.mp4','wb') as f:
        for line in response.iter_content():
        f.write(line)

 3.4、解析json

import requests
response = requests.get('http://httpbin.org/get').json()#獲取json數據
print(response)

 五、Redirection(重定向) and History(歷史)

import requests
import re
#第一次請求
r1  = requests.get('https://github.com/login')
r1_cookies = r1.cookies.get_dict()#獲取初始cookie(未被授權)
authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0]
#從頁面中拿到CSRF TOKEN
#第二次請求:帶着初始cookie和token發送POST請求給登錄頁面,帶上賬號密碼
data = {
    'commit':'Sign in',
    'utf8':'✓',
    'authenticity_token':authenticity_token,
    'login':'352932341@qq.com',
    'password':'123456tian'
}
#沒有指定allow_redirects=False,響應頭中出現Location就跳轉到新頁面,r2表示新頁面的response
r2 = requests.post('https://github.com/session',data=data,cookies = r1_cookies)
print(r2.status_code)#200
print(r2.url) #查看跳轉后的url即登錄成功后的url
print(r2.history)#查看跳轉前的response
print(r2.history[0].text)#查看跳轉前的response.text
# 指定allow_redirects=False,則響應頭中即出現Location也不會跳轉到新頁面,r3代理老頁面
r3 = requests.post('https://github.com/session',data=data,cookies=r1_cookies,allow_redirects=False)
print("我是r3",r3.status_code)#302
print("我是r3",r3.url) #查看跳轉前的url
print("我是r3",r3.history)#[]

 六、高級用法

1、SSL 證書驗證

import requests,re
#證書驗證(大部分網站都是https)
response = requests.get('https://www.12306.cn')
print(response)
#改進1:去掉報錯,但會報警告
response_1 = requests.get('https://www.12306.cn',verify=False)#不驗證證書,報警告返回200
print(response_1.status_code)
#改進2:去掉報錯,並且去掉警報信息
import requests
from requests.packages import urllib3
urllib3.disable_warnings() #關閉警告
respone = requests.get('https://www.12306.cn',verify=False)
print(respone.status_code)
#改進3:加上證書
#很多網站都是https,但是不用證書也可以訪問,大多數情況都是可以攜帶也可以不攜帶證書
#知乎\百度等都是可帶可不帶
#有硬性要求的,則必須帶,比如對於定向的用戶,拿到證書后才有權限訪問某個特定網站
import requests
respone = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))
print(respone.status_code)

 2、超時設置

#超時設置
#兩種超時:float 或 tuple元組
#timeout = 0.1 接收數據超時時間
#timeout =(0.1,0.2) 0.1代表鏈接超時,0.2代表接收數據超時
import requests
response = requests.get('https://www.baidu.com',timeout=0.0001)

 3、異常處理

import requests
from requests.exceptions import * #可以查看requests.exceptions獲取異常類型
try:
    r = requests.get('http://www.baidu.com',timeout=10)
except ReadTimeout:
    print('=====')
# except ConnectionError:#網絡不通
#     print('=====')
# except Timeout:
#     print('aaaaa')
except RequestException: #返回異常
    print('Error')

 4、上傳文件

import requests
files={'file':open('a.jpg','rb')}
respone=requests.post('http://httpbin.org/post',files=files)
print(respone.status_code)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM