利用Python多線程爬蟲——爬圖片

本文轉載自查看原文 2016-06-04 23:14 8996

程序功能大概就是爬取每個網頁中的圖片，並根據標題，分文件保存至指定目錄，使用threading實現多線程。

主要流程為每訪問一個網頁，將此網頁中的圖片鏈接依次放入隊列，根據圖片數量依次開啟下載線程，傳入隊列和編號，然后啟動線程開始下載，主線程查詢當前正在活動的線程數量，當數量為1的時候，即只剩主線程的時候，表示所有圖片下載完畢，開始下一個網頁。

class threadDownload(threading.Thread):
    def __init__(self,que,no):
        threading.Thread.__init__(self)
        self.que = que
        self.no = no
    def run(self):
        while True:
            if not self.que.empty():
                saveImg(self.que.get(),'os'+str(self.no)+'.jpg')
            else:
                break

def saveToFile(FileName,srcList):
    a=0
    srcTuple = (srcList)
    FileName = 'os'+FileName.strip()
    res = mkdir(FileName)
    if res == False:
        return False
    #os.mkdir(FileName)
    os.chdir(FileName)
    que = Queue.Queue()
    for sl in srcList:
        que.put(sl)
    for a in range(0,srcList.__len__()):
        threadD = threadDownload(que,a)
        threadD.start()
        #print threading.enumerate()
    while threading.active_count() != 0:
        if threading.active_count() == 1:
            print FileName+"  is Done"
            return True

def saveImg(imgUrl,fileName):
    user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
    headers = {'User-Agent':user_agent}
    try:
        req = urllib2.Request(imgUrl,headers=headers)
        res = urllib2.urlopen(req,timeout=5)
        data = res.read()
    except socket.timeout as e:
        print "saveImgTimeOut"
        return False
    f = open(fileName,'wb')
    f.write(data)
    f.close()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python多線程爬取圖片二利用python進行多線程爬蟲爬蟲多線程高效高速爬取圖片 Python多線程爬蟲爬取電影天堂資源 python爬蟲13 | 秒爬，這多線程爬取速度也太猛了，這次就是要讓你的爬蟲效率杠杠的利用Python多線程快速爬取某網站數據利用python多線程爬取妹子圖 python多線程爬蟲【Python爬蟲學習實踐】多線程爬取Bing每日壁紙 Python爬蟲入門教程 10-100 圖蟲網多線程爬取