最近需要解析一个JSONArray类型的字符串
1 [{"key":"姓名","value":"XX"},{"key":"资质","value":"从事贵金属投资行业10年 2 国家期货二级分析师 3 上金所荣誉长老"},{"key":"其他","value":""}]
在key资质对应的value中包含三条分行显示的信息,那么坑就来了,当JSON解析遇到\n(换行)就会抛出异常,那怎么办?
还好,想到了一个对策,就是使用java原生的String.replaceAll方法先把换行(\n)转换成能明文显示的\n(\\n)。
1 System.out.println(array.replaceAll("\n","\\n"));
结果发现,貌似不对劲,输出结果是这样的????
[{"key":"姓名","value":"XX"},{"key":"资质","value":"从事贵金属投资行业10年n国家期货二级分析师n上金所荣誉长老"},{"key":"其他","value":""}]
哇,有毒!怎么只剩下一个n了??
为了搞明白什么问题,百度、google?no,我们看源码。
先看一下replaceAll方法的源码
1 public String replaceAll(String regex, String replacement) { 2 return Pattern.compile(regex).matcher(this).replaceAll(replacement); 3 }
从源码中发现,该方式使用了正则匹配,那么,匹配的逻辑到底是怎么样的?我们再看看Matcher.replaceAll方法
1 public String replaceAll(String replacement) { 2 reset(); 3 boolean result = find(); 4 if (result) { 5 StringBuffer sb = new StringBuffer(); 6 do { 7 appendReplacement(sb, replacement); 8 result = find(); 9 } while (result); 10 appendTail(sb); 11 return sb.toString(); 12 } 13 return text.toString(); 14 }
从该方法中,我们可以看到,该方法中是一直循环直至find()返回false,每一次find匹配到换行(我们调用String.replaceAll时传入的匹配字符串是”\n”)都会执行appendReplacement方法,那么这个家伙到底做了什么呢?
1 public Matcher appendReplacement(StringBuffer sb, String replacement) { 2 3 // If no match, return error 4 if (first < 0) 5 throw new IllegalStateException("No match available"); 6 7 // Process substitution string to replace group references with groups 8 int cursor = 0; 9 StringBuilder result = new StringBuilder(); 10 11 while (cursor < replacement.length()) { 12 char nextChar = replacement.charAt(cursor); 13 if (nextChar == '\\') { 14 cursor++; 15 nextChar = replacement.charAt(cursor); 16 result.append(nextChar); 17 cursor++; 18 } else if (nextChar == '$') { 19 // Skip past $ 20 cursor++; 21 // A StringIndexOutOfBoundsException is thrown if 22 // this "$" is the last character in replacement 23 // string in current implementation, a IAE might be 24 // more appropriate. 25 nextChar = replacement.charAt(cursor); 26 int refNum = -1; 27 if (nextChar == '{') { 28 cursor++; 29 StringBuilder gsb = new StringBuilder(); 30 while (cursor < replacement.length()) { 31 nextChar = replacement.charAt(cursor); 32 if (ASCII.isLower(nextChar) || 33 ASCII.isUpper(nextChar) || 34 ASCII.isDigit(nextChar)) { 35 gsb.append(nextChar); 36 cursor++; 37 } else { 38 break; 39 } 40 } 41 if (gsb.length() == 0) 42 throw new IllegalArgumentException( 43 "named capturing group has 0 length name"); 44 if (nextChar != '}') 45 throw new IllegalArgumentException( 46 "named capturing group is missing trailing '}'"); 47 String gname = gsb.toString(); 48 if (ASCII.isDigit(gname.charAt(0))) 49 throw new IllegalArgumentException( 50 "capturing group name {" + gname + 51 "} starts with digit character"); 52 if (!parentPattern.namedGroups().containsKey(gname)) 53 throw new IllegalArgumentException( 54 "No group with name {" + gname + "}"); 55 refNum = parentPattern.namedGroups().get(gname); 56 cursor++; 57 } else { 58 // The first number is always a group 59 refNum = (int)nextChar - '0'; 60 if ((refNum < 0)||(refNum > 9)) 61 throw new IllegalArgumentException( 62 "Illegal group reference"); 63 cursor++; 64 // Capture the largest legal group string 65 boolean done = false; 66 while (!done) { 67 if (cursor >= replacement.length()) { 68 break; 69 } 70 int nextDigit = replacement.charAt(cursor) - '0'; 71 if ((nextDigit < 0)||(nextDigit > 9)) { // not a number 72 break; 73 } 74 int newRefNum = (refNum * 10) + nextDigit; 75 if (groupCount() < newRefNum) { 76 done = true; 77 } else { 78 refNum = newRefNum; 79 cursor++; 80 } 81 } 82 } 83 // Append group 84 if (start(refNum) != -1 && end(refNum) != -1) 85 result.append(text, start(refNum), end(refNum)); 86 } else { 87 result.append(nextChar); 88 cursor++; 89 } 90 } 91 // Append the intervening text 92 sb.append(text, lastAppendPosition, first); 93 // Append the match substitution 94 sb.append(result); 95 96 lastAppendPosition = last; 97 return this; 98 }
分析该方法的实现,我们可以发现在while循环的第一行执行了
1 char nextChar = replacement.charAt(cursor);
获取替换目标字符串的第一个字符,我们这里是”\\n”,那么第一个字符就是’\’,然后看第一个if语句
1 if (nextChar == '\\') { 2 cursor++; 3 nextChar = replacement.charAt(cursor); 4 result.append(nextChar); 5 cursor++; 6 }
当该字符为’\’时,cursor会++自增1,然后获取第二个字符’\’,把该字符append到result中,关键之处就在这里了,它把连续的两个反斜杠(‘\\’)变成了一个反斜杠(‘\’),到这里,问题貌似搞明白了。
那么,我们最终的写法应该是
System.out.println(array.replaceAll("\n","\\\\n"));
输出结果
1 [{"key":"姓名","value":"XX"},{"key":"资质","value":"从事贵金属投资行业10年\n国家期货二级分析师\n上金所荣誉长老"},{"key":"其他","value":""}]
