python的N個小功能(找到要爬取的驗證碼鏈接，並大量下載驗證碼樣本)

本文轉載自查看原文 2017-03-22 18:15 2124 python功能

# -*- coding: utf-8 -*-

"""

Created on Mon Mar 21 11:04:54 2017

@author: sl

"""

import requests

import time

#################################################################################

################先找到對應的爬取驗證碼連接，例如我要爬取車違章信息#################################

###############找到車違章鏈接http://smart.gzeis.edu.cn:8081/Content/AuthCode.aspx#####################

#################根據網頁源碼找到對應的登錄鏈接https://www.stc.gov.cn/szwsjj_web/jsp/xxcx/jdcjtwfcx.jsp#######

################根據網頁源碼找到對應的驗證碼鏈接https://www.stc.gov.cn:443/szwsjj_web/ImgServlet.action?######

#################################################################################

def downloads_pic(pic_name):

#url='http://smart.gzeis.edu.cn:8081/Content/AuthCode.aspx'

url='https://www.stc.gov.cn/szwsjj_web/ImgServlet.action?'

res=requests.get(url,stream=True) ####在罕見的情況下你可能想獲取來自服務器的原始套接字響應，那么你可以訪問 r.raw如果你確實想這么干，那請你確保在初始請求中設置了stream=True

print res

with open(r'G:\DownloadsVerificationCode\%s.jpg'%(pic_name),'wb') as f:

print res.iter_content(chunk_size=1024)

for chunk in res.iter_content(chunk_size=1024): ####使用Response.iter_content將會處理大量你直接使用Response.raw不得不處理的.當流下載時，上面是優先推薦的獲取內容方式

print chunk

if chunk: ###過濾下保持活躍的新塊

f.write(chunk)

f.flush() #方法是用來刷新緩沖區的，即將緩沖區中的數據立刻寫入文件，同時清空緩沖區，不需要是被動的等待輸出緩沖區寫入

f.close()

if __name__=='__main__':

for i in range(300):

pic_name=int(time.time()*1000000) #返回當前時間的時間戳（1970紀元后經過的浮點秒數）

downloads_pic(pic_name)

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python之驗證碼識別功能 python random 模塊及驗證碼功能 python django 實現驗證碼的功能 Jsoup爬取帶登錄驗證碼的網站 selenium自動爬取網易易盾的驗證碼 scrapy爬取驗證碼登錄網頁 Python 驗證碼解析 python驗證碼處理(1) Python驗證碼識別 Python驗證碼識別