FileReader的編碼問題

本文轉載自查看原文 2018-10-15 09:47 3195 JDK/ 通用實踐/ 開發經驗/ J2se/ java經驗集錦/ 工作總結

有一個UTF-8編碼的文本文件，用FileReader讀取到一個字符串，然后轉換字符集：str=new String(str.getBytes(),"UTF-8");結果大部分中文顯示正常，但最后仍有部分漢字顯示為問號！

[java] view plain copy

print ?

public static List<String> getLines( String fileName )
{
List<String> lines = new ArrayList<String>();
try
{
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = null;
while( ( line = br.readLine() ) != null )
lines.add(new String(line.getBytes("GBK"), "UTF-8"));
br.close();
}
catch( FileNotFoundException e )
{
}
catch( IOException e )
{
}
return lines;
}

文件讀入時是按OS的默認字符集即GBK解碼的，我先用默認字符集GBK編碼str.getBytes(“GBK”)，此時應該還原為文件中的字節序列了，然后再按UTF-8解碼，生成的字符串按理說應該就應該是正確的。

為什么結果中還是有部分亂碼呢？
問題出在FileReader讀取文件的過程中，FileReader繼承了InputStreamReader，但並沒有實現父類中帶字符集參數的構造函數，所以FileReader只能按系統默認的字符集來解碼，然后在UTF-8 -> GBK -> UTF-8的過程中編碼出現損失，造成結果不能還原最初的字符。

原因明確了，用InputStreamReader代替FileReader，InputStreamReader isr=new InputStreamReader(new FileInputStream(fileName),"UTF-8");這樣讀取文件就會直接用UTF-8解碼，不用再做編碼轉換。

[java] view plain copy

print ?

public static List<String> getLines( String fileName )
{
List<String> lines = new ArrayList<String>();
try
{
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8"));
String line = null;
while( ( line = br.readLine() ) != null )
lines.add(line);
br.close();
}
catch( FileNotFoundException e )
{
}
catch( IOException e )
{
}
return lines;
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 FileReader讀取中文字符亂碼問題 new FileReader() FileReader, readAsText FileReader方法 FileReader對象 python中編碼問題 HTTP協議的編碼問題哈夫曼編碼問題字符編碼問題 Python編碼問題整理