java 字符串split有很多坑，使用時請小心！！

本文轉載自查看原文 2014-12-29 16:20 10069 java

  System.out.println(":ab:cd:ef::".split(":").length);//末尾分隔符全部忽略  
  System.out.println(":ab:cd:ef::".split(":",-1).length);//不忽略任何一個分隔符  
  System.out.println(StringUtils.split(":ab:cd:ef::",":").length);//最前面的和末尾的分隔符全部都忽略,apache commons  
  System.out.println(StringUtils.splitPreserveAllTokens(":ab:cd:ef::",":").length);//不忽略任何一個分隔符 apache commons   
輸出：  
4  
6  
3  
6

看了下jdk里String類的public String[] split(String regex,int limit)方法，感覺平時不太會用這方法，以為在用正則表達式來拆分時候，如果匹配到的字符是最后一個字符時，會拆分出兩個空字符串，例如"o"split("o",5) or "o"split("o",-2)時候結果是"" "" 也就是下圖中紅框里的內容，所以平時一般都用split(String regex) 方法，其實也就等同於split(String regex，0)方法，把結尾的空字符串丟棄！

String的split方法用到的參數是一個正則式，雖然強大，但是有時候容易出錯。而且string並沒有提供簡化版本。org.apache.commons.lang.StringUtils提供的split改變了這一狀況，開始使用完整的字符串作為參數，而不是regex。同時，對類似功能的jdk版本的StringTokenizer，在內部方法splitWorker中有段注釋：Direct code is quicker than StringTokenizer.也就是說，這個是更快的一個工具了~~

StringUtils里的split和splitPreserveAllTokens 底層都是調用splitWorker方法實現的
下面分別來理解下兩個私有的splitWorker方法：

private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens)
{
        // Performance tuned for 2.0 (JDK1.4)

        if (str == null) {
            return null;
        }
        int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        List list = new ArrayList();
        int i = 0, start = 0;
        boolean match = false;
        boolean lastMatch = false;
        while (i < len) {
            if (str.charAt(i) == separatorChar) {
                if (match || preserveAllTokens) {
                    list.add(str.substring(start, i));
                    match = false;
                    lastMatch = true;
                }
                start = ++i;
                continue;
            }
            lastMatch = false;
            match = true;
            i++;
        }
        if (match || (preserveAllTokens && lastMatch)) {
            list.add(str.substring(start, i));
        }
        return (String[]) list.toArray(new String[list.size()]);
    }

是一個核心方法，用於拆分字符串，其中字符c表示分隔符，另外布爾變量b表示c在首尾的不同處理方式。為真，則在首位留一個""的字符串。但是在中間是沒有作用的。該方法執行如下操作：
如果字符串為null，則返回null。
如果字符串為""，則返回""。
用i作為指針遍歷字符串，match和lastMatch分別表示遇到和最后遇到可分割的內容。
如果字符串中第一個就遇到c，則看b的值，如果為真，則會在結果數組中存入一個""。如果沒遇到，match置真，lastMatch置假，表示有要分割的內容。
一旦遇到c，則在結果數組中輸出字符串在i之前的子字符串，並把起始點調整到i之后。且match置假，lastMatch置真。
遍歷結束，如果match為真（到最后也沒有遇到c），或者lastMatch和b同為真（最后一個字符是c），則輸出最后的部分（如果是后者，則會輸出一個""）。

private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens)
{
        // Performance tuned for 2.0 (JDK1.4)
        // Direct code is quicker than StringTokenizer.
        // Also, StringTokenizer uses isSpace() not isWhitespace()

        if (str == null) {
            return null;
        }
        int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        List list = new ArrayList();
        int sizePlus1 = 1;
        int i = 0, start = 0;
        boolean match = false;
        boolean lastMatch = false;
        if (separatorChars == null) {
            // Null separator means use whitespace
            while (i < len) {
                if (Character.isWhitespace(str.charAt(i))) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else if (separatorChars.length() == 1) {
            // Optimise 1 character case
            char sep = separatorChars.charAt(0);
            while (i < len) {
                if (str.charAt(i) == sep) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else {
            // standard case
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        }
        if (match || (preserveAllTokens && lastMatch)) {
            list.add(str.substring(start, i));
        }
        return (String[]) list.toArray(new String[list.size()]);
    }

也是一個核心方法，用於拆分字符串，其與上一個方法的不同之處在於其分隔符用字符串表示一組字符，且增加一個max變量，表示輸出的字符串數組的最大長度。另外注意該方法的b如果為真，會在首尾及中間起作用，且如果分隔符字符串長度大於 1，則數組中的""會更多（根據分隔符字符的數量）。該方法執行如下操作：
如果字符串為null，則返回null。
如果字符串為""，則返回""。
之后的處理分三種情況，分別是分隔符字符串為null，則默認為" "；分割符字符串長度為1；分割符字符串為普通字符串。這三種處理的不同只是在當前遍歷中的字符的判斷問題。
    1.利用Character.isWhitespace方法判斷每個字符是否為" "。
    2.先把字符串轉化為一個char，然后就和前一個splitWorker方法類似。
    3.利用indexOf方法查找當前字符是否在分隔符字符串中，然后就和前一個splitWorker方法類似。
    需要注意的是，如果輸出的數組的數量已經等於max的值，則把指針直接挪到最后，等待下次遍歷的時候直接跳出。同時由於lastMatch和match都置為假，最后也不會輸出""了。
   遍歷結束，如果match為真（到最后也沒有遇到c），或者lastMatch和b同為真（最后一個字符在分隔符字符串中），則輸出最后的部分（如果是后者，則會輸出一個""）。

轉載 http://yinny.iteye.com/blog/1750210

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java字符串split方法的坑 java將字符串根據空格進行分割，使用split方法 Java使用split()截取字符串 Java中字符串split() 的使用方法 java使用split截取字符串字符串無法分割 split無效: java split（）使用“.” “\” "|" "*" "+"要轉義 JAVA_split 字符串按照 . 分割 java split進行字符串分割 Python字符串方法split()中的一道坑 java使用split切割字符串的時候，注意轉義字符