Java正則表達式教程及示例

本文轉載自查看原文 2017-06-19 22:37 3505 JAVA/ Java/ 正則表達式/ 教程/ 示例

本文由 ImportNew - ImportNew讀者翻譯自 journaldev。歡迎加入翻譯小組。轉載請見文末要求。

【感謝 @CuGBabyBeaR 的熱心翻譯。如果其他朋友也有不錯的原創或譯文，可以嘗試投遞到 ImportNew。】

當我開始我的Java職業生涯的時候，對於我來說正則表達式簡直是個是夢魘。本教程旨在幫助你駕馭Java正則表達式，同時也幫助我復習正則表達式。

什么是正則表達式?

正則表達式定義了字符串的模式。正則表達式可以用來搜索、編輯或處理文本。正則表達式並不僅限於某一種語言，但是在每種語言中有細微的差別。Java正則表達式和Perl的是最為相似的。

Java正則表達式的類在 java.util.regex 包中，包括三個類：Pattern,Matcher 和 PatternSyntaxException。

Pattern對象是正則表達式的已編譯版本。他沒有任何公共構造器，我們通過傳遞一個正則表達式參數給公共靜態方法 compile 來創建一個pattern對象。
Matcher是用來匹配輸入字符串和創建的 pattern 對象的正則引擎對象。這個類沒有任何公共構造器，我們用patten對象的matcher方法，使用輸入字符串作為參數來獲得一個Matcher對象。然后使用matches方法，通過返回的布爾值判斷輸入字符串是否與正則匹配。
如果正則表達式語法不正確將拋出PatternSyntaxException異常。

讓我們在一個簡單的例子里看看這些類是怎么用的吧

 
                package 
                com.journaldev.util; 
               
                import 
                java.util.regex.Matcher; 
               
                import 
                java.util.regex.Pattern; 
               
                public 
                class 
                RegexExamples { 
               
                public 
                static 
                void 
                main(String[] args) { 
               
                // using pattern with flags 
               
                Pattern pattern = Pattern.compile( 
                "ab" 
                , Pattern.CASE_INSENSITIVE); 
               
                Matcher matcher = pattern.matcher( 
                "ABcabdAb" 
                ); 
               
                // using Matcher find(), group(), start() and end() methods 
               
                while 
                (matcher.find()) { 
               
                System.out.println( 
                "Found the text \"" 
                + matcher.group() 
               
                +  
                "\" starting at " 
                + matcher.start() 
               
                +  
                " index and ending at index " 
                + matcher.end()); 
               
                } 
               
                // using Pattern split() method 
               
                pattern = Pattern.compile( 
                "\\W" 
                ); 
               
                String[] words = pattern.split( 
                "one@two#three:four$five" 
                ); 
               
                for 
                (String s : words) { 
               
                System.out.println( 
                "Split using Pattern.split(): " 
                + s); 
               
                } 
               
                // using Matcher.replaceFirst() and replaceAll() methods 
               
                pattern = Pattern.compile( 
                "1*2" 
                ); 
               
                matcher = pattern.matcher( 
                "11234512678" 
                ); 
               
                System.out.println( 
                "Using replaceAll: " 
                + matcher.replaceAll( 
                "_" 
                )); 
               
                System.out.println( 
                "Using replaceFirst: " 
                + matcher.replaceFirst( 
                "_" 
                )); 
               
                } 
               
                }

上述程序的輸出是：

Input String matches regex - true
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*xx*
^
	at java.util.regex.Pattern.error(Pattern.java:1924)
	at java.util.regex.Pattern.sequence(Pattern.java:2090)
	at java.util.regex.Pattern.expr(Pattern.java:1964)
	at java.util.regex.Pattern.compile(Pattern.java:1665)
	at java.util.regex.Pattern.(Pattern.java:1337)
	at java.util.regex.Pattern.compile(Pattern.java:1022)
	at com.journaldev.util.PatternExample.main(PatternExample.java:13)

既然正則表達式總是和字符串有關， Java 1.4對String類進行了擴展，提供了一個matches方法來匹配pattern。在方法內部使用Pattern和Matcher類來處理這些東西，但顯然這樣減少了代碼的行數。

Pattern類同樣有matches方法，可以讓正則和作為參數輸入的字符串匹配，輸出布爾值結果。

下述的代碼可以將輸入字符串和正則表達式進行匹配。

 
                String str =  
                "bbb" 
                ; 
               
                System.out.println( 
                "Using String matches method: " 
                +str.matches( 
                ".bb" 
                )); 
               
                System.out.println( 
                "Using Pattern matches method: " 
                +Pattern.matches( 
                ".bb" 
                , str));

所以如果你的需要僅僅是檢查輸入字符串是否和pattern匹配，你可以通過調用String的matches方法省下時間。只有當你需要操作輸入字符串或者重用pattern的時候，你才需要使用Pattern和Matches類。

注意由正則定義的pattern是從左至右應用的，一旦一個原字符在一次匹配中使用過了，將不會再次使用。

例如，正則“121”只會匹配兩次字符串“31212142121″，就像這樣“_121____121″。

正則表達式通用匹配符號

正則表達式	說明	示例
.	Matches any single sign, includes everything 匹配任何單個符號，包括所有字符	(“..”, “a%”) – true(“..”, “.a”) – true (“..”, “a”) – false
^xxx	在開頭匹配正則xxx	(“^a.c.”, “abcd”) – true(“^a”, “a”) – true (“^a”, “ac”) – false
xxx$	在結尾匹配正則xxx	(“..cd$”, “abcd”) – true(“a$”, “a”) – true (“a$”, “aca”) – false
[abc]	能夠匹配字母a,b或c。[]被稱為character classes。	(“^[abc]d.”, “ad9″) – true(“[ab].d$”, “bad”) – true (“[ab]x”, “cx”) – false
[abc][12]	能夠匹配由1或2跟着的a,b或c	(“[ab][12].”, “a2#”) – true(“[ab]..[12]“, “acd2″) – true (“[ab][12]“, “c2″) – false
[^abc]	當^是[]中的第一個字符時代表取反，匹配除了a,b或c之外的任意字符。	(“[^ab][^12].”, “c3#”) – true(“[^ab]..[^12]“, “xcd3″) – true (“[^ab][^12]“, “c2″) – false
[a-e1-8]	匹配a到e或者1到8之間的字符	(“[a-e1-3].”, “d#”) – true(“[a-e1-3]“, “2″) – true (“[a-e1-3]“, “f2″) – false
xx\|yy	匹配正則xx或者yy	(“x.\|y”, “xa”) – true(“x.\|y”, “y”) – true (“x.\|y”, “yz”) – false

Java正則表達式元字符

正則表達式	說明
\d	任意數字，等同於[0-9]
\D	任意非數字，等同於[^0-9]
\s	任意空白字符，等同於[\t\n\x0B\f\r]
\S	任意非空白字符，等同於[^\s]
\w	任意英文字符，等同於[a-zA-Z_0-9]
\W	任意非英文字符，等同於[^\w]
\b	單詞邊界
\B	非單詞邊界

有兩種方法可以在正則表達式中像一般字符一樣使用元字符。

在元字符前添加反斜杠(\)
將元字符置於\Q(開始引用)和\E(結束引用)間

正則表達式量詞

量詞指定了字符匹配的發生次數。

正則表達式	說明
x?	x沒有出現或者只出現一次
X*	X出現0次或更多
X+	X出現1次或更多
X{n}	X正好出現n次
X{n,}	X出席n次或更多
X{n,m}	X出現至少n次但不多於m次

量詞可以和character classes和capturing group一起使用。

例如，[abc]+表示a,b或c出現一次或者多次。

(abc)+表示capturing group “abc”出現一次或多次。我們即將討論capturing group。

正則表達式capturing group

Capturing group是用來對付作為一個整體出現的多個字符。你可以通過使用()來建立一個group。輸入字符串中和capturing group相匹配的部分將保存在內存里，並且可以通過使用Backreference調用。

你可以使用matcher.groupCount方法來獲得一個正則pattern中capturing groups的數目。例如((a)(bc))包含3個capturing groups; ((a)(bc)), (a) 和 (bc)。

你可以使用在正則表達式中使用Backreference，一個反斜杠(\)接要調用的group號碼。

Capturing groups和Backreferences可能很令人困惑，所以我們通過一個例子來理解。

 
                System.out.println(Pattern.matches( 
                "(\\w\\d)\\1" 
                ,  
                "a2a2" 
                ));  
                //true 
               
                System.out.println(Pattern.matches( 
                "(\\w\\d)\\1" 
                ,  
                "a2b2" 
                ));  
                //false 
               
                System.out.println(Pattern.matches( 
                "(AB)(B\\d)\\2\\1" 
                ,  
                "ABB2B2AB" 
                ));  
                //true 
               
                System.out.println(Pattern.matches( 
                "(AB)(B\\d)\\2\\1" 
                ,  
                "ABB2B3AB" 
                ));  
                //false

在第一個例子里，運行的時候第一個capturing group是(\w\d)，在和輸入字符串“a2a2″匹配的時候獲取“a2″並保存到內存里。因此\1是”a2”的引用，並且返回true。基於相同的原因，第二行代碼打印false。

試着自己理解第三行和第四行代碼。:)

現在我們來看看Pattern和Matcher類中一些重要的方法。

我們可以創建一個帶有標志的Pattern對象。例如Pattern.CASE_INSENSITIVE可以進行大小寫不敏感的匹配。Pattern類同樣提供了和String類相似的split(String) 方法

Pattern類toString()方法返回被編譯成這個pattern的正則表達式字符串。

Matcher類有start()和end()索引方法，他們可以顯示從輸入字符串中匹配到的准確位置。

Matcher類同樣提供了字符串操作方法replaceAll(String replacement)和replaceFirst(String replacement)。

現在我們在一個簡單的java類中看看這些函數是怎么用的。

 
                package 
                com.journaldev.util; 
               
                import 
                java.util.regex.Matcher; 
               
                import 
                java.util.regex.Pattern; 
               
                public 
                class 
                RegexExamples { 
               
                public 
                static 
                void 
                main(String[] args) { 
               
                // using pattern with flags 
               
                Pattern pattern = Pattern.compile( 
                "ab" 
                , Pattern.CASE_INSENSITIVE); 
               
                Matcher matcher = pattern.matcher( 
                "ABcabdAb" 
                ); 
               
                // using Matcher find(), group(), start() and end() methods 
               
                while 
                (matcher.find()) { 
               
                System.out.println( 
                "Found the text \"" 
                + matcher.group() 
               
                +  
                "\" starting at " 
                + matcher.start() 
               
                +  
                " index and ending at index " 
                + matcher.end()); 
               
                } 
               
                // using Pattern split() method 
               
                pattern = Pattern.compile( 
                "\\W" 
                ); 
               
                String[] words = pattern.split( 
                "one@two#three:four$five" 
                ); 
               
                for 
                (String s : words) { 
               
                System.out.println( 
                "Split using Pattern.split(): " 
                + s); 
               
                } 
               
                // using Matcher.replaceFirst() and replaceAll() methods 
               
                pattern = Pattern.compile( 
                "1*2" 
                ); 
               
                matcher = pattern.matcher( 
                "11234512678" 
                ); 
               
                System.out.println( 
                "Using replaceAll: " 
                + matcher.replaceAll( 
                "_" 
                )); 
               
                System.out.println( 
                "Using replaceFirst: " 
                + matcher.replaceFirst( 
                "_" 
                )); 
               
                } 
               
                }

上述程序的輸出：

 
                Found the text  
                "AB" 
                starting at  
                0 
                index and ending at index  
                2 
               
                Found the text  
                "ab" 
                starting at  
                3 
                index and ending at index  
                5 
               
                Found the text  
                "Ab" 
                starting at  
                6 
                index and ending at index  
                8 
               
                Split using Pattern.split(): one 
               
                Split using Pattern.split(): two 
               
                Split using Pattern.split(): three 
               
                Split using Pattern.split(): four 
               
                Split using Pattern.split(): five 
               
                Using replaceAll: _345_678 
               
                Using replaceFirst: _34512678

原文鏈接： journaldev 翻譯： ImportNew.com - ImportNew讀者
譯文鏈接： http://www.importnew.com/6810.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 C#正則表達式教程和示例 Java正則表達式的語法與示例 java正則表達式 Java正則表達式 java正則表達式 java 正則表達式驗證 Java正則表達式 Java郵箱正則表達式 java String正則表達式 java正則表達式