百年古董代碼,今天突然報了個錯.
Invalid control character at: line 1 column
url = base_url.format(index)
# 組裝header
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.59 Safari/537.36",
"Connection": "keep-alive",
"X-Requested-With": "XMLHttpRequest",
"Accept-Encoding": "gzip, deflate",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.8",
# "Cookie": cookie
}
try:
# 執行url訪問;
response = do_get_no_proxy(url, header)
if response == "Error":
print "request 502: {}".format(index)
continue
resdata = StringIO.StringIO(response)
gzipper = gzip.GzipFile(fileobj=resdata).read()
res_json = json.loads(gzipper)
res = res_json['records']
except Exception, e:
logging.error(e)
continue
定位發現 錯誤為紅色處.
錯誤原因是因為gzipper內容沒有通過json語法檢查,存在\r\n之類的內容.
需要添加一個參數: strict=False
改成 res_json = json.loads(gzipper, strict=False)無報錯.
另外 學習到同系列內容:
存在二進制內容:
str = json.dumps(jsondata, encoding='latin1')
res_json = json.loads(strdata, encoding='latin1', strict=False)
純文本:
str = json.dumps(jsondata, ensure_ascii=False)