解決下載經過GZip壓縮后的網頁亂碼問題


目前很多網站默認采用GZip壓縮,如果不進行解壓縮,下載后生成的html頁面打開后會出現中文亂碼

亂碼前:

            string url = "http://quote.eastmoney.com/stocklist.html";
            using (var client = new HttpClient())
            {
                client.BaseAddress = new Uri(url);
                var response = client.GetAsync(url).Result;

                var content = response.Content.ReadAsStringAsync().Result;
                File.WriteAllText(@"C:\stock.html", content, Encoding.Default);

            }

亂碼效果:

解決代碼:

            string url = "http://quote.eastmoney.com/stocklist.html";
            using (var client = new HttpClient())
            {
                client.BaseAddress = new Uri(url);

                //關鍵代碼1:設置請求頭采用GZip和deflate兩種壓縮算法
                client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
                var response = client.GetAsync(url).Result;

                var fileStream = response.Content.ReadAsStreamAsync().Result;

                //關鍵代碼2:對文件流采用GZip算法解壓
                GZipStream gzip = new GZipStream(fileStream, CompressionMode.Decompress);

                using (StreamReader reader = new StreamReader(gzip, Encoding.GetEncoding("gb2312")))//中文編碼處理
                {
                    File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.Default);
                }
            }

 解決后效果:

亂碼有的時候不能單單靠轉File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.GetEncoding("gb2312"));方式解決,具體情況具體分析,思維多發散發散。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM