目前很多網站默認采用GZip壓縮,如果不進行解壓縮,下載后生成的html頁面打開后會出現中文亂碼
亂碼前:
string url = "http://quote.eastmoney.com/stocklist.html"; using (var client = new HttpClient()) { client.BaseAddress = new Uri(url); var response = client.GetAsync(url).Result; var content = response.Content.ReadAsStringAsync().Result; File.WriteAllText(@"C:\stock.html", content, Encoding.Default); }
亂碼效果:
解決代碼:
string url = "http://quote.eastmoney.com/stocklist.html"; using (var client = new HttpClient()) { client.BaseAddress = new Uri(url); //關鍵代碼1:設置請求頭采用GZip和deflate兩種壓縮算法 client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate"); var response = client.GetAsync(url).Result; var fileStream = response.Content.ReadAsStreamAsync().Result; //關鍵代碼2:對文件流采用GZip算法解壓 GZipStream gzip = new GZipStream(fileStream, CompressionMode.Decompress); using (StreamReader reader = new StreamReader(gzip, Encoding.GetEncoding("gb2312")))//中文編碼處理 { File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.Default); } }
解決后效果:
亂碼有的時候不能單單靠轉File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.GetEncoding("gb2312"));方式解決,具體情況具體分析,思維多發散發散。