Pattern和Matcher詳解（字符串匹配和字節碼）（轉http://blog.csdn.net/u010700335/article/details/44616451）

本文轉載自查看原文 2016-12-21 11:05 4389 java

一：起因

（1）Java里面進行字符串操作，第一個想到的是String類和 StringBuilder類內含replace() 、replaceAll() 、split()、matches()等方法 —— 其實String類里面的 public String[] split(String regex, int limit) 和 matches()方法，調用是Pattern.compile().matches()方法 ----- 源碼為：

return Pattern.compile(regex).split(this, limit);

（2）String類里面的public String replaceAll(String regex, String replacement) 方法也是一樣的 ----- 源碼為：

public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

（3）String類里面的public boolean matches(String regex)方法也是一樣的 ----- 源碼為：

public boolean matches(String regex) {
return Pattern.matches(regex, this);
}

（4）相對於Java， C++語言就沒有那么高大上了，連split() 和 trim()函數都沒有，但是可以通過c++string類里面的其他函數實現，例如find_first_of / find_not_first_of() 等。

（5）java正則表達式通過java.util.regex包下的Pattern類與Matcher類實現，(建議在閱讀本文時,打開java API文檔,當介紹到哪個方法時,查看java API中的方法說明,效果會更佳).

二：詳解

（1）Pattern類用於創建一個正則表達式,也可以說創建一個匹配模式,它的構造方法是私有的,不可以直接創建,但可以通過Pattern.complie(String regex)簡單工廠方法創建一個正則表達式,輪到Matcher類登場了,Pattern.matcher(CharSequence input)返回一個Matcher對象. Matcher類的構造方法也是私有的,不能隨意創建,只能通過Pattern.matcher(CharSequence input)方法得到該類的實例：

（2）find()對字符串進行匹配,匹配到的字符串可以在任何位置. 當使用matches(),lookingAt(),find()執行匹配操作后,就可以利用如下三個方法得到更詳細的信息Mathcer.start()/ Matcher.end()/ Matcher.group()
start()返回匹配到的子字符串在字符串中的索引位置.
end()返回匹配到的子字符串的最后一個字符在字符串中的索引位置.
group()返回匹配到的子字符串

[java] view plain copy

Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("aaa2223bb");
m.find();//匹配2223
m.start();//返回3
m.end();//返回7,返回的是2223后的索引號
m.group();//返回2223
Mathcer m2=p.matcher("2223bb");
m2.lookingAt(); //匹配2223
m2.start(); //返回0,由於lookingAt()只能匹配前面的字符串,所以當使用lookingAt()匹配時,start()方法總是返回0
m2.end(); //返回4
m2.group(); //返回2223
Matcher m3=p.matcher("2223"); //如果Matcher m3=p.matcher("2223bb"); 那么下面的方法出錯，因為不匹配返回false
m3.matches(); //匹配整個字符串
m3.start(); //返回0
m3.end(); //返回3,原因相信大家也清楚了,因為matches()需要匹配所有字符串
m3.group(); //返回2223
說說正則表達式的分組在java中是怎么使用的.
start(),end(),group()均有一個重載方法它們是start(int i),end(int i),group(int i)專用於分組操作,Mathcer類還有一個groupCount()用於返回有多少組.

三：案例回放

（1）簡單練習

[java] view plain copy

public static void main(String[] args) {
String phones1 = "MKY 的手機號碼：0939-100391"
+"XL 的手機號碼：0939-666888aaaa"
+"LJ 的手機號碼：0952-600391"
+"XQZ 的手機號碼：0939-550391";;
String regex = ".*0939-\\d{6}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(phones1);
while(matcher.find()) {
System.out.println(matcher.group()+"&&&&&");
System.out.println("start: " + matcher.start());
System.out.println("end: " + matcher.end());
}// 僅僅返回一個結果的,返回最長的結果，而不是一截一截的
String phones2 = "LJ 的手機號碼：0952-600391\r\n"
+"XQZ 的手機號碼：0939-550391";
//重用pattern
matcher = pattern.matcher(phones2);
while(matcher.find()) {
System.out.println(matcher.group());
}
// 另外一個pattern
String text = "abcdebcadxbc";
regex = ".bc";
Pattern pattern2 = Pattern.compile(regex);
matcher = pattern2.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()+"****");
System.out.println("start: " + matcher.start());
System.out.println("end: " + matcher.end());
}// 返回結果的是多個的
System.out.println("*************");
// 下面是兩個非常重要的
pattern = Pattern.compile("<.+?>", Pattern.DOTALL);
matcher = pattern.matcher("<a href=\"index.html\">主頁</a>");
String string = matcher.replaceAll("");
System.out.println(string);
pattern = Pattern.compile("href=\"(.+?)\"");
matcher = pattern.matcher("<a href=\"index.html\">主頁</a>");
if(matcher.find())
System.out.println(matcher.group(1));
}

（2）Java 正則表達式（此部分為轉載）

現在通過一些實驗來說明正則表達式的匹配規則,這兒是Greedy方式
. 任何字符
a? a一次或一次也沒有
a* a零次或多次
a+ a一次或多次
a{n}? a恰好 n 次
a{n,}? a至少n次
a{n,m}? a至少n次，但是不超過m次

//初步認識. * + ?
p("a".matches("."));//true
p("aa".matches("aa"));//true
p("aaaa".matches("a*"));//true
p("aaaa".matches("a+"));//true
p("".matches("a*"));//true
p("aaaa".matches("a?"));//false
p("".matches("a?"));//true
p("a".matches("a?"));//true
p("1232435463685899".matches("\\d{3,100}"));//true
p("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));//false
p("192".matches("[0-2][0-9][0-9]"));//true [abc] a、b 或 c（簡單類）
[^abc] 任何字符，除了 a、b 或 c（否定）
[a-zA-Z] a 到 z 或 A 到 Z，兩頭的字母包括在內（范圍）
[a-d[m-p]] a 到 d 或 m 到 p：[a-dm-p]（並集）
[a-z&&[def]] d、e 或 f（交集）
[a-z&&[^bc]] a 到 z，除了 b 和 c：[ad-z]（減去）
[a-z&&[^m-p]] a 到 z，而非 m 到 p：[a-lq-z]（減去）
//范圍
p("a".matches("[abc]"));//true
p("a".matches("[^abc]"));//false
p("A".matches("[a-zA-Z]"));//true
p("A".matches("[a-z]|[A-Z]"));//true
p("A".matches("[a-z[A-Z]]"));//true
p("R".matches("[A-Z&&[RFG]]"));//true
\d 數字：[0-9]
\D 非數字： [^0-9]
\s 空白字符：[ \t\n\x0B\f\r]
\S 非空白字符：[^\s]
\w 單詞字符：[a-zA-Z_0-9]
\W 非單詞字符：[^\w]
//認識\s \w \d \
p("\n\r\t".matches("\\s(4)"));//false
p(" ".matches("\\S"));//false
p("a_8 ".matches("\\w(3)"));//false
p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));//true
p("\\".matches("\\\\"));//true

邊界匹配器
^ 行的開頭
$ 行的結尾
\b 單詞邊界
\B 非單詞邊界
\A 輸入的開頭
\G 上一個匹配的結尾
\Z 輸入的結尾，僅用於最后的結束符（如果有的話）
\z 輸入的結尾
//邊界匹配
p("hello sir".matches("^h.*"));//true
p("hello sir".matches(".*ir$"));//true
p("hello sir".matches("^h[a-z]{1,3}o\\b.*"));//true
p("hellosir".matches("^h[a-z]{1,3}o\\b.*"));//false
//空白行:一個或多個(空白並且非換行符)開頭，並以換行符結尾
p(" \n".matches("^[\\s&&[^\\n]]*\\n$"));//true

重點說明
表示<>之間有任意一個（含）字符以上，括號表示捕獲組，匹配后可以單獨提取出括號內的內容，?代表最短匹配，比如<asdf>>>這樣的輸入，有？會匹配成<asdf>，沒有？會匹配整個<asdf>>>。

str.ReplactAll("<(.)+?>","")就是把所有<>間有一個字符以上的文字都替換為空。比如
asdf<1234>jkl<>會變成asdfjkl<>
另外要是str_line.replaceAll("&(.)+?;"," ") 是將&開頭的包含任意字符的右面的最短匹配並以;結束的都替換成為空

四：java 字節碼

（1）

ldc將int, float或String型常量值從常量池中推送至棧頂
iload 將指定的int型本地變量推送至棧頂
lload 將指定的long型本地變量推送至棧頂
（2）
fstore 將棧頂float型數值存入指定本地變量
dstore 將棧頂double型數值存入指定本地變量
astore 將棧頂引用型數值存入指定本地變量
istore_1 將棧頂int型數值存入第二個本地變量
astore_1 將棧頂引用型數值存入第二個本地變量
astore_2 將棧頂引用型數值存入第三個本地變量

pop 將棧頂數值彈出 (數值不能是long或double類型的)
dup 復制棧頂數值並將復制值壓入棧頂

iadd 將棧頂兩int型數值相加並將結果壓入棧頂
ladd 將棧頂兩long型數值相加並將結果壓入棧頂
isub 將棧頂兩int型數值相減並將結果壓入棧頂
imul 將棧頂兩int型數值相乘並將結果壓入棧頂
（3）
ireturn 從當前方法返回int
areturn 從當前方法返回對象引用
return 從當前方法返回void

getstatic 獲取指定類的靜態域，並將其值壓入棧頂
putstatic 為指定的類的靜態域賦值
getfield 獲取指定類的實例域，並將其值壓入棧頂
putfield 為指定的類的實例域賦值

invokevirtual 調用實例方法
invokespecial 調用超類構造方法，實例初始化方法，私有方法
invokestatic 調用靜態方法
invokeinterface 調用接口方法
（4）
new 創建一個對象，並將其引用值壓入棧頂
newarray 創建一個指定原始類型（如int, float, char…）的數組，並將其引用值壓入棧頂

if_icmpgt 比較棧頂兩int型數值大小，當結果大於0時跳轉
if_icmple 比較棧頂兩int型數值大小，當結果小於等於0時跳轉

五：String類的源碼分析

[java] view plain copy

<strong> (</strong>1) subString() --- public String substring(int beginIndex, int endIndex)
/**
* Returns a new string that is a substring of this string. The
* substring begins at the specified <code>beginIndex</code> and
* extends to the character at index <code>endIndex - 1</code>.
* Thus the length of the substring is <code>endIndex-beginIndex</code>.
* <p>
* Examples:
* <blockquote><pre>
* "hamburger".substring(4, 8) returns "urge"
* "smiles".substring(1, 5) returns "mile"
* </pre></blockquote>
*
* @param beginIndex the beginning index, inclusive.
* @param endIndex the ending index, exclusive.
* @return the specified substring.
* @exception IndexOutOfBoundsException if the
* <code>beginIndex</code> is negative, or
* <code>endIndex</code> is larger than the length of
* this <code>String</code> object, or
* <code>beginIndex</code> is larger than
* <code>endIndex</code>.
*/
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
(2) -- indexOf(ch) indexOf(String str) 匹配整個str才返回首位置
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* @param source the characters being searched.
* @param sourceOffset offset of the source string.
* @param sourceCount count of the source string.
* @param target the characters being searched for.
* @param targetOffset offset of the target string.
* @param targetCount count of the target string.
* @param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j] ==
target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}

(3)總之，C++的string類里面的find_first_of() 和 Java里的indexOf()不一樣的，substring() 貌似沒有

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Redis中為什么使用跳表---------轉自http://blog.csdn.net/u010412301/article/details/64923131 delphi 解析Json格式（轉自：http://blog.csdn.net/jayqiang/article/details/7066824） jar命令+7z：創建，替換，修改，刪除Jar, war, ear包中的文件 (轉http://blog.csdn.net/u013613428/article/details/51669882) MySQL的表分區詳解 - 查看分區數據量，查看全庫數據量----轉http://blog.csdn.net/xj626852095/article/details/51245844 Cesium學習筆記（五）：3D 模型（http://blog.csdn.net/umgsoil/article/details/74572877） Cesium學習筆記（四）Camera ----http://blog.csdn.net/hobhunter/article/details/74909641 [轉]直接在網頁中顯示pdf文檔、網上發布PDF文檔 ---- 轉自: http://blog.csdn.net/hexzwj/article/details/2745753 [轉]C#獲取電腦硬件信息（CPU ID、主板ID、硬盤ID、BIOS編號）說明 -----http://blog.csdn.net/docflying/article/details/4128146 Myeclipse2016 安裝反編譯插件(http://blog.csdn.net/zhangk007/article/details/51146071) IDEA main方法自動補全（轉發：http://blog.csdn.net/zjx86320/article/details/52684601）

Pattern和Matcher詳解（字符串匹配和字節碼） （轉http://blog.csdn.net/u010700335/article/details/44616451）

免責聲明！

Pattern和Matcher詳解（字符串匹配和字節碼）（轉http://blog.csdn.net/u010700335/article/details/44616451）