java讀取TXT文件（硬核區分編碼格式）

本文轉載自查看原文 2021-08-16 18:12 133 java

廢話：我看了百度上大部分的自動讀取TXT文件，在不確定編碼格式的情況下，好像都沒啥效果，但是我還是保留了，萬一有用呢[狗頭]，可能是我的搜索方式不對，沒有找到正確結果，我目前的方法由於太過硬核我也覺得不是很好，如果有更好的方法，望告知，先謝謝啦

原理：實現原理真的很粗暴，我把幾乎所有漢字的utf8編碼的byte數組統計了一下范圍，一個utf8字符占3個字節，就比對文件中第一個漢字是否符合utf8的三個字節，但是缺點也很明顯，肯定會有沒有涵蓋到的，別問為什么只區分utf8，因為只有utf8編碼還算有規律

 1 public static String readTxtFile(String filePath){
 2         try {
 3             File file=new File(filePath);
 4             if(file.isFile() && file.exists()){ //判斷文件是否存在
 5                 InputStream inputStream = new FileInputStream(file);
 6                 byte[] head = new byte[3];
 7                 inputStream.read(head);
 8                 String code = "GBK";
 9                 if (head[0] == -1 && head[1] == -2){
10                     code = "UTF-16";
11                 }else if (head[0] == -2 && head[1] == -1){
12                     code = "Unicode";
13                 }else if (head[0] == -17 && head[1] == -69 && head[2] == -65){
14                     code = "UTF-8";
15                 }else{
16                     byte[] text = new byte[(int)file.length()];
17                     System.arraycopy(head,0,text,0,3);
18                     inputStream.read(text,3,text.length-3);
19                     for (int i = 0; i < text.length; i++) {
20                         int a = text[i]&0xFF;
21                         int b = text[i+1]&0xFF;
22                         if (a>0x7F){//排除開頭的英文或者數字字符
23                             if (0xE3<a&a<0xE9&&b>0x7F&&b<0xC0){//符合utf8
24                                 code = "UTF-8";
25                                 break;
26                             }else break;
27                         }
28                     }
29                 }
30                 System.out.println(code);
31                 InputStreamReader read = new InputStreamReader(
32                         new FileInputStream(file),code);//考慮到編碼格式
33                 BufferedReader bufferedReader = new BufferedReader(read);
34                 String lineTxt;
35                 String res = "";
36                 while((lineTxt = bufferedReader.readLine()) != null){
37                     //System.out.println(lineTxt);
38                     res += lineTxt;
39                 }
40                 read.close();
41                 return res;
42             }else{
43                 System.out.println("找不到指定的文件");
44             }
45         } catch (Exception e) {
46             System.out.println("讀取文件內容出錯");
47             e.printStackTrace();
48         }
49         return null;
50     }

所以，有更好的方法要告訴我啊

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 java 讀取不同編碼的txt文件中文亂碼 java讀取TXT文件 java讀取txt文件 Java讀取txt文件和寫入txt文件 JAVA讀取TXT文件、新建TXT文件、寫入TXT文件 java讀取TXT文件的方法 java文件讀取(word,txt) 修改 .txt 文件默認編碼格式 C# 判斷txt文件編碼格式 python讀取文件編碼格式