Java字符串split方法的坑


先來看幾行簡單的Java代碼,如下:

System.out.println("1,2".split(",").length);
System.out.println("1,2,".split(",").length);
System.out.println("".split(",").length);
System.out.println(",".split(",").length);

接下來,猜一下各行的輸出結果。OK,下面給出真正的運行結果:

2
2
1
0

這里先給出jdk相關源碼,再來對應分析各自的輸出:

public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

1.第一行代碼的輸出結果肯定沒什么問題,字符串 "1,2" 以 "," 分隔,結果很直觀的是 ["1", "2"]length=2

2.第二行代碼的輸出結果,可能大家有人認為是length=3才對,因為字符串 "1,2," 以 "," 分隔,結果應該是 ["1", "2", ""],length=3;其實不然,jdk在split處理的時候,確實會先生成一個集合list = ["1", "2", ""],但之后卻會循環判斷末位元素是否為空字符串(即末位元素length=0),因此集合最終會變成 ["1", "2"]length=2。具體判斷如下:

while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
        resultSize--;
}

3.第三行代碼的輸出結果,數組 [""]length=1。與其他三種情況不同,空字符串 "" 中不包含regex字符串 ",",所以代表沒有匹配上的子串(off=0),則返回字符串本身。具體處理如下:

// If no match was found, return this
if (off == 0)
    return new String[]{this};

4.第四行代碼的輸出結果,可能也有部分人認為結果應是length=2,因為字符串 "," 以 "," 分隔,結果應該是 ["", ""],length=2;其實亦不然,與第2行同樣的原理,最終將list=["", ""] 處理為空集合 []length=0

以上,系本文分享的split的一個小坑;除此之外,另一個需要注意的地方,split方法的參數是正則表達式而非一般字符串,所以在處理正則轉義字符和特殊字符時留意即可。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM