天堂圖片網下載,將img標簽中的src屬性提取出來,交給
urllib.request.urlretrieve函數【urllib.urlretrieve(python2中)】自動回調Schedule函數,顯示當前下載進度,
Schedule包含3個參數
blocknum:已經下載的數據塊
blocksize:數據塊的大小
totalsize:遠程文件的大小
1 import urllib.request 2 from lxml import etree 3 import requests 4 def Schedule(blocknum,blocksize,totalsize): 5 ''''' 6 blocknum:已經下載的數據塊 7 blocksize:數據塊的大小 8 totalsize:遠程文件的大小 9 ''' 10 per = 100.0 * blocknum * blocksize / totalsize 11 if per > 100 : 12 per = 100 13 print('當前下載進度:%d'%per) 14 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 15 headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} 16 r = requests.get('http://www.ivsky.com/tupian/ziranfengguang/',headers=headers) 17 #使用lxml解析網頁 18 html = etree.HTML(r.text) 19 img_urls = html.xpath('.//img/@src')#先找到所有的img 20 i=0 21 for img_url in img_urls: 22 urllib.request.urlretrieve(img_url,'img'+str(i)+'.jpg',Schedule) 23 i+=1
