soup.select的用法

本文转载自查看原文 2019-12-11 23:07 532

1、通过标签选择

 
    # 选择所有title标签
soup.select("title")
# 选择所有p标签中的第三个标签
soup.select("p:nth-of-type(3)") 相当于soup.select(p)[2]
# 选择body标签下的所有a标签
soup.select("body a")
# 选择body标签下的直接a子标签
soup.select("body > a")
# 选择id=link1后的所有兄弟节点标签
soup.select("#link1 ~ .mysis")
# 选择id=link1后的下一个兄弟节点标签
soup.select("#link1 + .mysis") 
   

　　2、通过类名查找

# 选择a标签，其类属性为mysis的标签
soup.select("a.mysis")

　　3、通过id查找

# 选择a标签，其id属性为link1的标签
soup.select("a#link1")

　　4、通过【属性】查找，当然也适用于class

# 选择a标签，其属性中存在myname的所有标签
soup.select("a[myname]")
# 选择a标签，其属性href=http://example.com/lacie的所有标签
soup.select("a[href='http://example.com/lacie']")
# 选择a标签，其href属性以http开头
soup.select('a[href^="http"]')
# 选择a标签，其href属性以lacie结尾
soup.select('a[href$="lacie"]')
# 选择a标签，其href属性包含.com
soup.select('a[href*=".com"]')
# 从html中排除某标签，此时soup中不再有script标签
[s.extract() for s in soup('script')]
# 如果想排除多个呢
[s.extract() for s in soup(['script','fram']

1、通过标签选择

 
         # 选择所有title标签 
        
         soup. 
         select 
         ( 
         "title" 
         ) 
        
         # 选择所有p标签中的第三个标签 
        
         soup. 
         select 
         ( 
         "p:nth-of-type(3)" 
         ) 相当于soup. 
         select 
         (p)[2] 
        
         # 选择body标签下的所有a标签 
        
         soup. 
         select 
         ( 
         "body a" 
         ) 
        
         # 选择body标签下的直接a子标签 
        
         soup. 
         select 
         ( 
         "body > a" 
         ) 
        
         # 选择id=link1后的所有兄弟节点标签 
        
         soup. 
         select 
         ( 
         "#link1 ~ .mysis" 
         ) 
        
         # 选择id=link1后的下一个兄弟节点标签 
        
         soup. 
         select 
         ( 
         "#link1 + .mysis" 
         )

　　2、通过类名查找

 
         # 选择a标签，其类属性为mysis的标签 
        
         soup. 
         select 
         ( 
         "a.mysis" 
         )

　　3、通过id查找

 
         # 选择a标签，其id属性为link1的标签 
        
         soup. 
         select 
         ( 
         "a#link1" 
         )

　　4、通过【属性】查找，当然也适用于class

 
         # 选择a标签，其属性中存在myname的所有标签 
        
         soup. 
         select 
         ( 
         "a[myname]" 
         ) 
        
         # 选择a标签，其属性href=http://example.com/lacie的所有标签 
        
         soup. 
         select 
         ( 
         "a[href='http://example.com/lacie']" 
         ) 
        
         # 选择a标签，其href属性以http开头 
        
         soup. 
         select 
         ( 
         'a[href^="http"]' 
         ) 
        
         # 选择a标签，其href属性以lacie结尾 
        
         soup. 
         select 
         ( 
         'a[href$="lacie"]' 
         ) 
        
         # 选择a标签，其href属性包含.com 
        
         soup. 
         select 
         ( 
         'a[href*=".com"]' 
         ) 
        
         # 从html中排除某标签，此时soup中不再有script标签 
        
         [s.extract()  
         for 
         s  
         in 
         soup( 
         'script' 
         )]  
        
         # 如果想排除多个呢 
        
         [s.extract()  
         for 
         s  
         in 
         soup([ 
         'script' 
         , 
         'fram' 
         ]

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 记录BeautifulSoup中soup.select的用法 Python爬虫利器二之Beautiful Soup的用法 beautiful soup 的select 选择器使用时报错 TypeError: 'NoneType' object is not callable SQL - SELECT COUNT用法 CollectionUtils.select用法 SQL insert into select 用法 select 语句中 if 的用法 MySQL select into outfile用法 Linq 之 Select 和 where 的用法 LINQ学习：Select的用法