PythonStudy_爬取网页title和摘要

本文转载自查看原文 2018-06-04 15:42 814 python

# coding=utf-8
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

# 获取网页标题
def get_url_Title_Description(url):
    # 获取网页全部信息content
    content = urlopen(url).read().decode('utf-8')

    # 正则表达式匹配标题
    pat = r'<title>(.*?)</title>'
    title = re.findall(pat,content)

    # 提取网页摘要
    soup = BeautifulSoup(content,"html.parser")
    description = soup.find(attrs={"name":"description"})['content']

    # 返回标题和摘要
    return (title[0],description)

# ----------test----------------
# url = "http://www.sina.com.cn/"
# title,dsp = get_url_Title_Description(url)
# print(title)
# print(dsp)

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 PythonStudy_打地鼠游戏代码网页源码爬取爬取静态网页 Python爬取网页信息 BeautifulSoup爬取网页分页菜鸟学IT之python网页爬取多页爬取爬虫——爬取Ajax动态加载网页 Python爬虫爬取网页图片 python动态网页的爬取 Python和BeautifulSoup进行网页爬取