python來獲取網頁中的所有鏈接

本文轉載自查看原文 2020-05-07 22:11 3415 爬蟲/ python/ 網絡

注意：使用前要裝selenium第三方的庫才可以使用

版本：python3

from bs4 import BeautifulSoup
from urllib import request

# 要請求的網絡地址
url = 'https://www.hao123.com/'

# 請求網絡地址得到html網頁代碼
html = request.urlopen(url)

# 整理代碼
soup = BeautifulSoup(html, 'html.parser')

# 找出所有的 a 標簽， 因為所有的鏈接都在 a 標簽內
data = soup.find_all('a')

# 打開文件對象做持久化操作
file = open('D:/link.txt', mode='w', encoding='utf-8')

# 遍歷所有的 a 標簽， 獲取它們的 href 屬性的值和它們的 text
for item in data:
    if item.string is not None and item['href'] != 'javascript:;' and item['href'] != '#':
        print(item.string, item.get('href'))
        file.write(str.__add__(item.string, ' '))
        file.write(str.__add__(item['href'], '\n'))

file.close()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【python】獲取指定網頁上的所有超級鏈接 Python 網絡爬蟲 009 (編程) 通過正則表達式來獲取一個網頁中的所有的URL鏈接，並下載這些URL鏈接的源代碼 python爬蟲入門---第一篇：獲取某一網頁所有超鏈接用python selenium提取網頁中的所有標簽中的超級鏈接地址正則匹配所有網頁鏈接 Python爬蟲如何獲取頁面內所有URL鏈接？本文詳解 jmeter接口測試--循環獲取網頁中的html鏈接（java）Jsoup爬蟲學習--獲取網頁所有的圖片，鏈接和其他信息，並檢查url和文本信息 Java正則表達式獲取網頁所有網址和鏈接文字 Python爬蟲項目，獲取所有網站上的新聞，並保存到數據庫中，解析html網頁等