python来获取网页中的所有链接

本文转载自查看原文 2020-05-07 22:11 3415 爬虫/ python/ 网络

注意：使用前要装selenium第三方的库才可以使用

版本：python3

from bs4 import BeautifulSoup
from urllib import request

# 要请求的网络地址
url = 'https://www.hao123.com/'

# 请求网络地址得到html网页代码
html = request.urlopen(url)

# 整理代码
soup = BeautifulSoup(html, 'html.parser')

# 找出所有的 a 标签， 因为所有的链接都在 a 标签内
data = soup.find_all('a')

# 打开文件对象做持久化操作
file = open('D:/link.txt', mode='w', encoding='utf-8')

# 遍历所有的 a 标签， 获取它们的 href 属性的值和它们的 text
for item in data:
    if item.string is not None and item['href'] != 'javascript:;' and item['href'] != '#':
        print(item.string, item.get('href'))
        file.write(str.__add__(item.string, ' '))
        file.write(str.__add__(item['href'], '\n'))

file.close()

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Java正则表达式获取网页所有网址和链接文字获取页面所有链接的JS Python小应用1 - 抓取网页中的链接地址 python获取类中的所有方法 Python | 获取某个模块中的所有类获取当前页面的所有链接的四种方法对比（python 爬虫） python获取doc文件中超链接和文本 C# 抓取并导出网页里面所有超链接方法 JS 获取链接中的参数 Python获取网页html代码