最近需要解析一個JSONArray類型的字符串
1 [{"key":"姓名","value":"XX"},{"key":"資質","value":"從事貴金屬投資行業10年 2 國家期貨二級分析師 3 上金所榮譽長老"},{"key":"其他","value":""}]
在key資質對應的value中包含三條分行顯示的信息,那么坑就來了,當JSON解析遇到\n(換行)就會拋出異常,那怎么辦?
還好,想到了一個對策,就是使用java原生的String.replaceAll方法先把換行(\n)轉換成能明文顯示的\n(\\n)。
1 System.out.println(array.replaceAll("\n","\\n"));
結果發現,貌似不對勁,輸出結果是這樣的????
[{"key":"姓名","value":"XX"},{"key":"資質","value":"從事貴金屬投資行業10年n國家期貨二級分析師n上金所榮譽長老"},{"key":"其他","value":""}]
哇,有毒!怎么只剩下一個n了??
為了搞明白什么問題,百度、google?no,我們看源碼。
先看一下replaceAll方法的源碼
1 public String replaceAll(String regex, String replacement) { 2 return Pattern.compile(regex).matcher(this).replaceAll(replacement); 3 }
從源碼中發現,該方式使用了正則匹配,那么,匹配的邏輯到底是怎么樣的?我們再看看Matcher.replaceAll方法
1 public String replaceAll(String replacement) { 2 reset(); 3 boolean result = find(); 4 if (result) { 5 StringBuffer sb = new StringBuffer(); 6 do { 7 appendReplacement(sb, replacement); 8 result = find(); 9 } while (result); 10 appendTail(sb); 11 return sb.toString(); 12 } 13 return text.toString(); 14 }
從該方法中,我們可以看到,該方法中是一直循環直至find()返回false,每一次find匹配到換行(我們調用String.replaceAll時傳入的匹配字符串是”\n”)都會執行appendReplacement方法,那么這個家伙到底做了什么呢?
1 public Matcher appendReplacement(StringBuffer sb, String replacement) { 2 3 // If no match, return error 4 if (first < 0) 5 throw new IllegalStateException("No match available"); 6 7 // Process substitution string to replace group references with groups 8 int cursor = 0; 9 StringBuilder result = new StringBuilder(); 10 11 while (cursor < replacement.length()) { 12 char nextChar = replacement.charAt(cursor); 13 if (nextChar == '\\') { 14 cursor++; 15 nextChar = replacement.charAt(cursor); 16 result.append(nextChar); 17 cursor++; 18 } else if (nextChar == '$') { 19 // Skip past $ 20 cursor++; 21 // A StringIndexOutOfBoundsException is thrown if 22 // this "$" is the last character in replacement 23 // string in current implementation, a IAE might be 24 // more appropriate. 25 nextChar = replacement.charAt(cursor); 26 int refNum = -1; 27 if (nextChar == '{') { 28 cursor++; 29 StringBuilder gsb = new StringBuilder(); 30 while (cursor < replacement.length()) { 31 nextChar = replacement.charAt(cursor); 32 if (ASCII.isLower(nextChar) || 33 ASCII.isUpper(nextChar) || 34 ASCII.isDigit(nextChar)) { 35 gsb.append(nextChar); 36 cursor++; 37 } else { 38 break; 39 } 40 } 41 if (gsb.length() == 0) 42 throw new IllegalArgumentException( 43 "named capturing group has 0 length name"); 44 if (nextChar != '}') 45 throw new IllegalArgumentException( 46 "named capturing group is missing trailing '}'"); 47 String gname = gsb.toString(); 48 if (ASCII.isDigit(gname.charAt(0))) 49 throw new IllegalArgumentException( 50 "capturing group name {" + gname + 51 "} starts with digit character"); 52 if (!parentPattern.namedGroups().containsKey(gname)) 53 throw new IllegalArgumentException( 54 "No group with name {" + gname + "}"); 55 refNum = parentPattern.namedGroups().get(gname); 56 cursor++; 57 } else { 58 // The first number is always a group 59 refNum = (int)nextChar - '0'; 60 if ((refNum < 0)||(refNum > 9)) 61 throw new IllegalArgumentException( 62 "Illegal group reference"); 63 cursor++; 64 // Capture the largest legal group string 65 boolean done = false; 66 while (!done) { 67 if (cursor >= replacement.length()) { 68 break; 69 } 70 int nextDigit = replacement.charAt(cursor) - '0'; 71 if ((nextDigit < 0)||(nextDigit > 9)) { // not a number 72 break; 73 } 74 int newRefNum = (refNum * 10) + nextDigit; 75 if (groupCount() < newRefNum) { 76 done = true; 77 } else { 78 refNum = newRefNum; 79 cursor++; 80 } 81 } 82 } 83 // Append group 84 if (start(refNum) != -1 && end(refNum) != -1) 85 result.append(text, start(refNum), end(refNum)); 86 } else { 87 result.append(nextChar); 88 cursor++; 89 } 90 } 91 // Append the intervening text 92 sb.append(text, lastAppendPosition, first); 93 // Append the match substitution 94 sb.append(result); 95 96 lastAppendPosition = last; 97 return this; 98 }
分析該方法的實現,我們可以發現在while循環的第一行執行了
1 char nextChar = replacement.charAt(cursor);
獲取替換目標字符串的第一個字符,我們這里是”\\n”,那么第一個字符就是’\’,然后看第一個if語句
1 if (nextChar == '\\') { 2 cursor++; 3 nextChar = replacement.charAt(cursor); 4 result.append(nextChar); 5 cursor++; 6 }
當該字符為’\’時,cursor會++自增1,然后獲取第二個字符’\’,把該字符append到result中,關鍵之處就在這里了,它把連續的兩個反斜杠(‘\\’)變成了一個反斜杠(‘\’),到這里,問題貌似搞明白了。
那么,我們最終的寫法應該是
System.out.println(array.replaceAll("\n","\\\\n"));
輸出結果
1 [{"key":"姓名","value":"XX"},{"key":"資質","value":"從事貴金屬投資行業10年\n國家期貨二級分析師\n上金所榮譽長老"},{"key":"其他","value":""}]
