python爬蟲，beatifulsop獲取標簽屬性值（取值）案例

本文轉載自查看原文 2021-05-06 14:48 4917 python

前面的案例里，均采用正則匹配的方式取值

title = re.findall('">(.*?)</a>', i, re.S)[0]#標題
url = re.findall('="(.*?)" target', i, re.S)[0]#地址

這么寫的容錯能力有限，爬取的數據越多，越容易出現匹配不到內容的情況

這次采用獲取屬性值的方式取值，除非屬性變化，否則基本不會出現錯誤

爬取下圖內鏈接紅色框內文章標題和鏈接

目標內容html結構如下圖

可見，href的值是鏈接，title的值是標題，所以，獲取對應內容的寫法如下

title = i.get("title")#地址
url = i.get("href")#地址

因為目標數據是通過匹配所有“a”標簽來獲取的，所有有一部分數據並不是本次案例需要的，為了使爬取的內容更加精簡，所以對soup.find_all的匹配規則進行的補充

以前是直接寫成“results = soup.find_all('a')”，后發現目標數列里有共同的“target='_blank'”內容，其他“a”內沒有，所可以寫成“results = soup.find_all('a', target='_blank')”

上面兩處修改，使腳本爬取更加精准有效，容錯能力得到提升

附全部代碼

from bs4 import BeautifulSoup
import requests
import time

fgwurl = 'http://fgw.hunan.gov.cn/fgw/tslm_77952/hgzh/index.html'

def fgw(fgwurl):
    response = requests.get(fgwurl)
    response.encoding='utf-8'
    soup = BeautifulSoup(response.text,'lxml')
    results = soup.find_all('a', target='_blank')for i in results:
        h=str(i)
        if "title" in h:
            #title = i.get_text()#標題
            title = i.get("title")#地址
            url = i.get("href")#地址
            print(title +"  "+ "詳情請點擊" + "  " + url)
        else:
            None

fgw(fgwurl)

參考鏈接：

https://blog.csdn.net/jaray/article/details/106604362

https://www.cnblogs.com/kaibindirver/p/9927297.html

http://blog.sina.com.cn/s/blog_166ae58120102xomk.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 js-取值&賦值-獲取某標簽某屬性的值 Python 爬蟲根據屬性值關鍵字搜索標簽 jsp頁面從標簽屬性中獲取值 python用selenium獲取元素標簽內容和屬性值 js 獲取標簽屬性值 python用selenium獲取元素標簽內容和屬性值 JS通過HTML標簽自身屬性獲取屬性值關於li標簽的value屬性值的獲取問題 vue.js 獲取標簽屬性值 Python爬蟲庫BeautifulSoup獲取對象(標簽)名,屬性,內容,注釋