PythonStudy_爬取網頁title和摘要

本文轉載自查看原文 2018-06-04 15:42 814 python

# coding=utf-8
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

# 獲取網頁標題
def get_url_Title_Description(url):
    # 獲取網頁全部信息content
    content = urlopen(url).read().decode('utf-8')

    # 正則表達式匹配標題
    pat = r'<title>(.*?)</title>'
    title = re.findall(pat,content)

    # 提取網頁摘要
    soup = BeautifulSoup(content,"html.parser")
    description = soup.find(attrs={"name":"description"})['content']

    # 返回標題和摘要
    return (title[0],description)

# ----------test----------------
# url = "http://www.sina.com.cn/"
# title,dsp = get_url_Title_Description(url)
# print(title)
# print(dsp)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PythonStudy_打地鼠游戲代碼 PythonStudy_關於utf-8和GBK編碼網頁源碼爬取 PythonStudy_‘百元買百雞’代碼網頁源碼爬取爬取靜態網頁簡單爬取網頁源碼 JAVA爬取網頁郵箱爬蟲爬取多個網頁學習強國網頁爬取)