python requests 超時與重試

本文轉載自查看原文 2019-11-28 19:50 617 Python基礎/ Python

一源起：

requests模塊作為python爬蟲方向的基礎模塊實際上在日常實際工作中也會涉及到，比如用requests向對方接口url發送POST請求進行推送數據，使用GET請求拉取數據。

但是這里有一個狀況需要我們考慮到：那就是超時的情況如何處理，超時后重試的機制。

二連接超時與讀取超時：

超時：可分為連接超時和讀取超時。

連接超時

連接超時，連接時request等待的時間(s)

import requests import datetime url = 'http://www.google.com.hk' start = datetime.datetime.now() print('start', start) try: html = requests.get(url, timeout=5).text print('success') except requests.exceptions.RequestException as e: print(e) end = datetime.datetime.now() print('end', end) print('耗時： {time}'.format(time=(end - start))) # 結果： # start 2019-11-28 14:19:24.249588 # HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x000001D8ECB1CCC0>, 'Connection to www.google.com.hk timed out. (connect timeout=5)')) # end 2019-11-28 14:19:29.262519 # 耗時： 0:00:05.012931

因為 google 被牆了，所以無法連接，錯誤信息顯示 connect timeout（連接超時）。

就算不設置timeout=5，也會有一個默認的連接超時時間(大約21秒左右)。

start 2019-11-28 15:00:36.441117 HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000023130B9CCC0>: Failed to establish a new connection: [WinError 10060] 由於連接方在一段時間后沒有正確答復或連接的主機沒有反應，連接嘗試失敗。',)) end 2019-11-28 15:00:57.459768 耗時： 0:00:21.018651

讀取超時

讀取超時，客戶端等待服務器發送請求的事件，特定地指要等待服務器發送字節之間的時間，在大部分情況下，是指服務器發送第一個字節之前的時間。

總而言之：

　　連接超時 ==> 發起請求連接到建立連接之間的最大時長

　　讀取超時 ==> 連接成功開始到服務器返回響應之間等待的最大時長

故，如果設置超時時間/timeout,這個timeout值將會作為connect和read二者的timeout。如果分別設置，就需要傳入一個元組：

r = requests.get('https://github.com', timeout=5) r = requests.get('https://github.com', timeout=(0.5, 27))

例: 設置一個15秒的響應等待時間的請求：

import datetime import requests url_login = 'http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex03/' session = requests.Session() session.get(url_login) token = session.cookies['csrftoken'] session.post(url_login, data={'csrfmiddlewaretoken': token, 'username': 'xx', 'password': 'xx'}) start = datetime.datetime.now() print('start', start) url_pw = 'http://www.heibanke.com/lesson/crawler_ex03/pw_list/'
try: html = session.get(url_pw, timeout=(5, 10)).text print('success') except requests.exceptions.RequestException as e: print(e) end = datetime.datetime.now() print('end', end) print('耗時： {time}'.format(time=(end - start))) # start 2019-11-28 19:32:20.589827 # # success # # end 2019-11-28 19:32:22.590872 # # 耗時： 0:00:02.001045

如果設置為：timeout=(1, 0.5)，錯誤信息中顯示的是 read timeout（讀取超時）

start 2019-11-28 19:36:38.503593 HTTPConnectionPool(host='www.heibanke.com', port=80): Read timed out. (read timeout=0.5) end 2019-11-28 19:36:39.005271 耗時： 0:00:00.501678

讀取超時是沒有默認值的，如果不設置，請求將一直處於等待狀態，爬蟲經常卡住又沒有任何信息錯誤，原因就是因為讀取超時了。

超時重試

一般超時不會立即返回，而是設置一個多次重連的機制

import requests import datetime url = 'http://www.google.com.hk'

def gethtml(url): i = 0 while i < 3: start = datetime.datetime.now() print('start', start) try: html = requests.get(url, timeout=5).text return html except requests.exceptions.RequestException: i += 1 end = datetime.datetime.now() print('end', end) print('耗時： {time}'.format(time=(end - start))) if __name__ == '__main__': gethtml(url)

其實 requests 已經有封裝好的方法：

import time import requests from requests.adapters import HTTPAdapter s = requests.Session() s.mount('http://', HTTPAdapter(max_retries=3)) s.mount('https://', HTTPAdapter(max_retries=3)) print(time.strftime('%Y-%m-%d %H:%M:%S')) try: r = s.get('http://www.google.com.hk', timeout=5) print(r.text) except requests.exceptions.RequestException as e: print(e) print(time.strftime('%Y-%m-%d %H:%M:%S')) # 2019-11-28 19:48:05 # HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000019D8D88D208>, 'Connection to www.google.com.hk timed out. (connect timeout=5)')) # 2019-11-28 19:48:25

max_retries為最大重試次數，重試3次，加上最初的一次請求，共4次，所以上述代碼運行耗時20秒而不是15秒

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python requests的超時和重試【Python 庫】requests 詳解超時和重試 python requests 配置超時及重試次數 python 超時重試方法 python 超時重試方法 Python+request超時和重試 python中Requests的重試機制 python爬蟲多次請求超時的幾種重試方法 python接口自動化30-requests超時重試方法(由於連接方在一段時間后沒有正確答復或連接的主機沒有反應，連接嘗試失敗) python接口自動化（二十八） requests超時重試方法（由於連接方在一段時間后沒有正確答復或連接的主機沒有反應，連接嘗試失敗）