Java中以字節長度截取存在中文的字符串（UTF-8編碼）

本文轉載自查看原文 2020-07-30 23:09 780 JAVA

背景：以定長字節輸出含中文字符時，因ASCII碼字符占1字節，而中文GBK字符占2字節，中文UTF-8字符占3字節，為避免輸出長度超過定長，故需對含中文的內容進行處理。

此處以輸出UTF-8為例，其他編碼同理。

 1     // 方法1
 2     public static String subStrUtf8(String str, int beginIndex, int endIndex) {
 3         String subStr = "";
 4         try {
 5             int byteEndIndex = Math.min(str.length(), endIndex);
 6             int byteLen = 0;
 7             do {
 8                 // 將要截取的子串長度減1，此處切記用 byteEndIndex--，而不是 --byteEndIndex
 9                 subStr = str.substring(beginIndex, byteEndIndex--);
10 
11                 // 更新subStr轉為UTF-8的byte[]的長度
12                 byteLen = subStr.getBytes("UTF-8").length;
13 
14                 // 只要byteLen大於最初想要截取的子串的值，則繼續循環
15             } while (byteLen > endIndex - beginIndex);
16         } catch (UnsupportedEncodingException e) {
17             e.printStackTrace();
18         }
19         return subStr;
20     }
21 
22     //方法2
23     public static String subStrUtf8(String str, int subLen) {
24         String subStr = "";
25         try {
26             int byteEndIndex = Math.min(str.length(), subLen);
27             int byteLen = 0;
28             do {
29                 // 將要截取的子串長度減1，此處切記用 byteEndIndex--，而不是 --byteEndIndex
30                 subStr = str.substring(0, byteEndIndex--);
31 
32                 // 更新subStr轉為UTF-8的byte[]的長度
33                 byteLen = subStr.getBytes("UTF-8").length;
34 
35                 // 只要byteLen大於最初想要截取的子串的值，則繼續循環
36             } while (byteLen > subLen);
37         } catch (UnsupportedEncodingException e) {
38             e.printStackTrace();
39         }
40         return subStr;
41     }
42 
43     public static void main(String[] args) {
44         String str = "abcd你好efgh謝謝";
45         System.out.println(subStrUtf8(str, 0, 8));
46         System.out.println(subStrUtf8(str, 8));
47     }

運行結果：

abcd你
abcd你

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java按字節截取字符串(GBK編碼、UTF-8編碼實現) java截取字符串中字節長度【轉】 JAVA中文字符串編碼--GBK轉UTF-8 Java實現按字節長度截取字符串的方法 Java 中文字符串編碼之GBK轉UTF-8 按字節長度截取字符串按字節長度截取字符串 mb_substr()截取中文方法的詳解（加上‘utf-8’,字符串截取不到的問題詳解）字符串轉換UTF-8編碼 Java 按字節獲得字符串(中文)長度