問題描述
今天在爬蟲的時候經常遇到Traceback (most recent call last):異常,程序寫得比較簡陋,沒有處理異常,導致爬蟲程序經常報錯停止。經過調試,發現是爬蟲網站不穩定導致連接失敗。
解決方法
maxTryNum = 20
for tries in range(maxTryNum):
try:
response = requests.get(urls[i], headers=headers, timeout=60)
with open(dir_name + '/' + file_name,'wb') as f:
f.write(response.content)
except:
if tries < (maxTryNum - 1):
continue
else:
print("Has tried %d times to access url %s, all failed!" % (maxTryNum, urls[i]))
break