split函數的說明

split函數java docs的說明：

When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.A zero-width match at the beginning however never produces such empty leading substring.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

split函數的工作原理大概可以分為以下的幾步：

1、遍歷查找到regex，把regex前面到上一次的位置中間部分添加到list。這是split函數的核心部分
2、如果沒有找到，則返回自身的一維數組
3、是否添加剩余的內容到list中
4、是否去除list里面的空字符串
5、從上面的list里面返回成數組

對於split函數limit的值可能會出現以下的幾種情況：

1、Limit < 0, e.g. limit = -1
2、limit = 0，不傳默認是0
3、Limit > 0，e.g. limit = 3
4、limit > size，e.g. limit = 20

split函數的原理

我們通過以下的例子來分析一下split函數的原理。

public void test() {
    String string = "linux---abc-linux-";
    splitStringWithLimit(string, -1);
    splitStringWithLimit(string, 0);
    splitStringWithLimit(string, 3);
    splitStringWithLimit(string, 20);
}

public void splitStringWithLimit(String string, int limit) {
    String[] arrays = string.split("-", limit);
    String result = MessageFormat.format("arrays={0}, length={1}", Arrays.toString(arrays), arrays.length);
    System.out.println(result);
}

// arrays=[linux, , , abc, linux, ], length=6
// arrays=[linux, , , abc, linux], length=5
// arrays=[linux, , -abc-linux-], length=3
// arrays=[linux, , , abc, linux, ], length=6

一、關於第一步的操作，分為兩個分支。

1、如果regex是正則表達式的元字符：".$|()[{^?*+\\”，或者regex是以\開頭，以不是0-9, a-z, A-Z結尾的雙字符。

if (((regex.value.length == 1 &&
        ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
        (regex.length() == 2 &&
        regex.charAt(0) == '\\' &&
        (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
        ((ch-'a')|('z'-ch)) < 0 &&
        ((ch-'A')|('Z'-ch)) < 0)) &&
    (ch < Character.MIN_HIGH_SURROGATE ||
        ch > Character.MAX_LOW_SURROGATE))

使用index函數查找regex的位置，維護兩個下標變量。off表示上一次查找的位置(第一次off是0)，next是本次查找的位置。每次查找之后把off到next中間的內容添加到list中。最后更新off的值為next+1。以供下一次的查找。

{
    int off = 0;
    int next = 0;
    boolean limited = limit > 0;
    ArrayList<String> list = new ArrayList<>();
    while ((next = indexOf(ch, off)) != -1) {
        if (!limited || list.size() < limit - 1) {
            list.add(substring(off, next));
            off = next + 1;
        } else {    // last one
            //assert (list.size() == limit - 1);
            list.add(substring(off, value.length));
            off = value.length;
            break;
        }
    }
    // If no match was found, return this
    if (off == 0)
        return new String[]{this};

    // Add remaining segment
    if (!limited || list.size() < limit)
        list.add(substring(off, value.length));

    // Construct result
    int resultSize = list.size();
    if (limit == 0) {
        while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
            resultSize--;
        }
    }
    String[] result = new String[resultSize];
    return list.subList(0, resultSize).toArray(result);
}

2、如果regex不滿足上面的判斷，比如說是長度大於2的字符。

return Pattern.compile(regex).split(this, limit);

使用正則表達式的mather函數，查找到regex的位置。維護着index變量，相當於上述的off。而matcher查找到的m.start()則相當於上述的next。每次查找之后把index到m.start()中間的內容添加到list中。最后更新off的值為m.end()。以供下一次的查找。

 1 public String[] split(CharSequence input, int limit) {
 2     int index = 0;
 3     boolean matchLimited = limit > 0;
 4     ArrayList<String> matchList = new ArrayList<>();
 5     Matcher m = matcher(input);
 6 
 7     // Add segments before each match found
 8     while(m.find()) {
 9         if (!matchLimited || matchList.size() < limit - 1) {
10             if (index == 0 && index == m.start() && m.start() == m.end()) {
11                 // no empty leading substring included for zero-width match
12                 // at the beginning of the input char sequence.
13                 continue;
14             }
15             String match = input.subSequence(index, m.start()).toString();
16             matchList.add(match);
17             index = m.end();
18         } else if (matchList.size() == limit - 1) { // last one
19             String match = input.subSequence(index,
20                                                 input.length()).toString();
21             matchList.add(match);
22             index = m.end();
23         }
24     }
25 
26     // If no match was found, return this
27     if (index == 0)
28         return new String[] {input.toString()};
29 
30     // Add remaining segment
31     if (!matchLimited || matchList.size() < limit)
32         matchList.add(input.subSequence(index, input.length()).toString());
33 
34     // Construct result
35     int resultSize = matchList.size();
36     if (limit == 0)
37         while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
38             resultSize--;
39     String[] result = new String[resultSize];
40     return matchList.subList(0, resultSize).toArray(result);
41 }

二、關於第二步：

如果off為0，也就是沒有找到regex。直接返回自身的一維數組。

三、關於第三步：

如果limit <= 0或者list的長度還沒有達到我們設置的Limit數值。那么就把剩下的內容(最后的一個regex位置到末尾)添加到list中。

四、關於第四步

這里針對的是limit等於0的處理。如果limit=0，那么會把會從后向前遍歷list的內容。去除空的字符串(中間出現的空字符串不會移除) 。

五、關於第五步

調用List里面的toArray方法，返回數組。

友情鏈接

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java中String類的split()方法詳解 Java中關於String的split(String regex, int limit) 方法 java學習筆記之String.Split方法 java基礎---->String中replace和replaceAll方法 String中split(regex,limit)方法講解 JAVA String中的split()方法分隔空格注意正則表達式 Java中String的split()方法的一些需要注意的地方 String.split()方法 java中的split方法中參數作用字符串分割--java中String.split()用法