爬取煎蛋XXOO妹子圖片

本文轉載自查看原文 2018-05-14 23:40 53373 Python爬蟲

今天回憶廖大的多線程的時候，看到下面有人寫了個多線程的爬蟲http://www.tendcode.com/article/jiandan-meizi-spider-2/，點進去看了下，分析的很仔細，寫了接近200行代碼吧

讓后我就研究了一下這個網站，emmmm，selenium + PhantomJS不就直接搞定了嘛，然后就寫了段code:

然后發現，哇，selenium不支持PhantomJS了，因為chrome和firefox自帶了headless的訪問，然后就去各個blog看，最后爬下了這個網站:

 1 import unittest
 2 import requests
 3 import time
 4 import re
 5 from random import randint
 6 from selenium import webdriver
 7 from selenium.webdriver.chrome.options import Options
 8 from selenium.webdriver.common.keys import Keys
 9 
10 class ooxx_spider(unittest.TestCase):
11 
12     def setUp(self):
13         chrome_options = Options()
14         chrome_options.add_argument('--headless')
15         chrome_options.add_argument('--disable-gpu')
16         self.driver = webdriver.Chrome('E:/chromedriver.exe', chrome_options=chrome_options)
17 
18     def test_spider(self):
19         for i in range(1, 80):
20             url = 'http://jandan.net/ooxx/' + 'page-' + str(i)
21             self.driver.get(url)
22             print(url)
23             elem = self.driver.find_elements_by_xpath('//*[@class="commentlist"]/li/div/div/div/p/img')#/li/div/div/div/p/img
24             for j in elem:
25                 self.save_img(j.get_attribute('src'))
26             print('第{}頁爬取成功'.format(i))
27 
28     def save_img(self, res):
29         suffix = res.split('.')[-1]
30         destination = 'picture/' + str(randint(1, 1000)) + str(randint(1, 1000)) + '.'+ suffix
31         r = requests.get(res)
32         with open(destination, 'wb') as f:
33             f.write(r.content)
34 
35     def tearDown(self):
36         self.driver.close()
37 
38 if __name__ == '__main__':
39     unittest.main()

補上多線程的代碼

核心代碼:

1 def test_multiscraping(self):
2         p = Pool()#默認大小是cpu的核數，你可以修改比如說雙核Pool(2)
3         #這里假設我是4個進程，所以range(5)
4         for i in range(5):
5             p.apply_async(scraping, args = (i, ))
6         p.close()
7         p.join()

cpu太垃圾了，晚上回去用同學的cpu測試一下(留下了窮人的眼淚)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲–爬取煎蛋網妹子圖片 python 爬取煎蛋ooxx妹子圖 python 爬蟲爬取煎蛋網妹子圖 Python爬蟲之——爬取妹子圖片 scrapy框架爬取妹子圖片 Python的scrapy之爬取妹子圖片爬取妹子圖 python爬煎蛋妹子圖--20多行代碼搞定煎蛋妹子圖庫爬蟲爬取妹子圖 python 爬取妹子圖