bs4 python解析html

本文转载自查看原文 2016-04-24 11:54 5480 python

使用文档：https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

python的编码问题比较恶心。

decode解码
encode编码


在文件头设置

# -*- coding: utf-8 -*-
让python使用utf8.

# -*- coding: utf-8 -*-
__author__ = 'Administrator'

from bs4 import BeautifulSoup
import requests
import os
import sys
import io

def getHtml(url):
    r = requests.get(url)
    content = r.content.decode('utf8')
    #print(content)
    soup = BeautifulSoup(content)
    print(soup.find_all('h2'))
    print(soup.find_all('p'))

if __name__=="__main__":

    print(sys.getdefaultencoding())
    print("start.......")
    url = "http://www.jiakaobaodian.com/mnks/exercise/0-c1-kemu1-chengdu.html?id=800000"
    getHtml(url)
    print("end.......")

　　Demo

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 bs4解析库爬虫-使用BeautifulSoup4（bs4）解析html数据 html 网页源码解析：bs4中BeautifulSoup Python网络爬虫(数据解析-bs4模块) 爬虫解析之(六) --- bs4模块 python关于bs4库的整理【Python 库】bs4的使用爬虫之数据解析（bs4，Xpath）网页解析 -- bs4 和 xpath 的简单使用基于bs4库的HTML内容查找方法