bs4修改html文件和保存


一、需求

將2個html文件保存到本地瀏覽器,例如:

A頁面(我的博客主頁)

 

 

B頁面(爬蟲四大金剛)

 

然后將A頁面中的爬蟲鏈接,鏈接的a標簽中的href屬性修改成本地B頁面的地址,實現在本地瀏覽A頁面跳轉到B頁面

二、代碼

parent_page=r"C:\Users\ffm11\Desktop\Maple_feng - 博客園.html"
sub_page=r"C:\Users\ffm11\Desktop\爬蟲四大金剛:requests,selenium,BeautifulSoup,Scrapy - Maple_feng - 博客園.html"

with open(parent_page, 'r',encoding="utf-8") as file:
    pcontent = file.read()

sp = BeautifulSoup(pcontent, 'lxml')
'''
<a class="postTitle2" href="https://www.cnblogs.com/angelyan/p/10496950.html">
[置頂]    爬蟲四大金剛:requests,selenium,BeautifulSoup,Scrapy
</a>
'''
text=sp.find_all('a',class_='postTitle2')[0].get_text()
print(text)
new_tag = sp.new_tag("a")
new_tag.attrs = {"href":sub_page,"class":"postTitle2"}
new_tag.string  = text
# replace the paragraph using `replace_with` method

sp.find_all('a',class_='postTitle2')[0].replace_with(new_tag)
# open another file for writing
with open(parent_page, 'w',encoding="utf-8") as fp:
    # write the current soup content
    fp.write(sp.prettify())

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM