利用beautifulsoup下載網頁html代碼中的css, js, img文件並保存

本文轉載自查看原文 2019-05-29 09:30 687 python/ beautifulsoup

# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup as BS 
import urllib.request as rqst
import os

url = 'http://xxxxxxx'
headers = {'User-Agent': 'xxxxxx(這個網上隨便找一個都可以)','Accept-Encoding':'utf-8'}
r = rqst.Request(url, headers=headers) 

html = rqst.urlopen(url)

#網頁用bs解析
bs = BS(req, 'lxml')

#獲取css,js,img文件的路由
elc = bs.find_all('link', type='text/css')
elj = bs.find_all('script')
eli = bs.find_all('img')


#保存css,js,img文件

for c in elc:
url = c['href'] #如果href不完整需要自己調整,下面的一樣

file = url.split('/')[-1] #獲取文件名

if(os.path.exists (file)==False):
try:
res = rqst.urlopen(url)
txt = res.read()
with open(file, 'wt', encoding='utf-8') as f:
f.write(txt)
f.close()
except Exception:
pass

for j in elj:
if(i.has_attr('src')):
url = j['src']
file = url.split('/')[-1]
if(os.path.exists(file)==False):
try:
res = rqst.urlopen(url)
txt = res.read()
with open(file, 'wt', encoding='utf-8') as f:
f.write(txt)
f.close()
except Exception:
pass

for i in eli:
url = i['src']
url = 'http://www.fmhhqb.com'+url
file = url.split('/')[-1]
if(os.path.exists(file)==False):
try:
r = getRequest(url)
res = rqst.urlopen(r)
txt = res.read()
with open(file, 'wb') as f:
f.write(txt)
f.close()
except Exception:
pass

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 gulp實現打包js/css/img/html文件，並對js/css/img文件加上版本號 gulp實現打包js/css/img/html文件，並對js/css/img文件加上版本號網頁中插入HTML、css、js等代碼，美化（高亮）代碼工具/插件 Python3讀取網頁HTML代碼，並保存在本地文件中將網頁html文件離線下載保存到本地的方法 vue中引入外部文件js、css、img的方法在django中訪問靜態文件(js css img) 如何通過JS文件來渲染網頁（即將html代碼寫在JS中，封裝成一個模塊，需要時調用）： webpack入門之打包html,css,js,img(二) webpack入門之打包html,css,js,img(一)