Python 抓取網頁gb2312亂碼問題

本文轉載自查看原文 2018-04-06 08:19 1813 python

python 爬取學校所有人四六級成績時發現爬出網頁中文亂碼

遂google

得到一解決方案

# -*- coding:utf8 -*-  
  
import urllib2  
  
req = urllib2.Request("http://jwgl.hist.edu.cn/jwweb/jiaow/data46/search1.asp")

res = urllib2.urlopen(req) 
html = res.read() res.close() html = unicode(html, "gb2312").encode("utf8") #gb2312--->utf-8 

print html

但這並沒有解決問題

開始繼續試錯

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018-04-05 21:59
# @Author  : wqjhky@gmial.com
# @File    : Test2.py
# @Software: PyCharm
import urllib2
import urllib
import sys
import chardet
url = "http://jwgl.hist.edu.cn/jwweb/jiaow/data46/search1.asp "
key = raw_input("請輸入學號")
formadate = {
    "ksh1":key,
    "Submit":"%C8%B7%B6%A8"
}
data = urllib.urlencode(formadate)
request = urllib2.Request(url,data=data)
RES  = urllib2.urlopen(request).read()
RES = RES.decode('gb2312').encode('utf-8')
wfile=open(r'./1.html',r'wb')
wfile.write(RES)
wfile.close()
print RES
成功

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python GB2312亂碼問題【UWP】解析GB2312、GBK編碼網頁亂碼問題解決python3爬取網頁（GB2312編碼）中文亂碼問題 node爬蟲解決網頁編碼為gb2312結果為亂碼的方法 Utf-8和Gb2312亂碼問題的終結 nodejs下request模塊中文gb2312亂碼問題 python中文字符亂碼（GB2312，GBK，GB18030相關的問題）【已解決】python中文字符亂碼（GB2312，GBK，GB18030相關的問題）【知識積累】爬蟲之網頁亂碼解決方法(gb2312 -> utf-8) ASP教程:gb2312和utf-8亂碼問題解決