java之Pattern類詳解

本文轉載自查看原文 2016-12-06 10:26 31514 java

在JDK 1.4中，Java增加了對正則表達式的支持。

java與正則相關的工具主要在java.util.regex包中；此包中主要有兩個類：Pattern、Matcher。

Pattern

聲明：public final class Pattern implements java.io.Serializable

Pattern類有final 修飾，可知他不能被子類繼承。

含義：模式類，正則表達式的編譯表示形式。

注意：此類的實例是不可變的，可供多個並發線程安全使用。

字段：

 public static final int UNIX_LINES = 0x01;

    /**
     * 啟用不區分大小寫的匹配。*/
    public static final int CASE_INSENSITIVE = 0x02;

    /**
     * 模式中允許空白和注釋。
     */
    public static final int COMMENTS = 0x04;

    /**
     * 啟用多行模式。
     */
    public static final int MULTILINE = 0x08;

    /**
     * 啟用模式的字面值解析。*/
    public static final int LITERAL = 0x10;

    /**
     * 啟用 dotall 模式。
     */
    public static final int DOTALL = 0x20;

    /**
     * 啟用 Unicode 感知的大小寫折疊。*/
    public static final int UNICODE_CASE = 0x40;

    /**
     *  啟用規范等價。
     */
    public static final int CANON_EQ = 0x80;
    private static final long serialVersionUID = 5073258162644648461L;

    /**
     * The original regular-expression pattern string.
     */
    private String pattern;

    /**
     * The original pattern flags.
     */
    private int flags;

    /**
     * Boolean indicating this Pattern is compiled; this is necessary in order
     * to lazily compile deserialized Patterns.
     */
    private transient volatile boolean compiled = false;

    /**
     * The normalized pattern string.
     */
    private transient String normalizedPattern;

    /**
     * The starting point of state machine for the find operation.  This allows
     * a match to start anywhere in the input.
     */
    transient Node root;

    /**
     * The root of object tree for a match operation.  The pattern is matched
     * at the beginning.  This may include a find that uses BnM or a First
     * node.
     */
    transient Node matchRoot;

    /**
     * Temporary storage used by parsing pattern slice.
     */
    transient int[] buffer;

    /**
     * Temporary storage used while parsing group references.
     */
    transient GroupHead[] groupNodes;

    /**
     * Temporary null terminated code point array used by pattern compiling.
     */
    private transient int[] temp;

    /**
     * The number of capturing groups in this Pattern. Used by matchers to
     * allocate storage needed to perform a match.此模式中的捕獲組的數目。
     */
    transient int capturingGroupCount;

    /**
     * The local variable count used by parsing tree. Used by matchers to
     * allocate storage needed to perform a match.
     */
    transient int localCount;

    /**
     * Index into the pattern string that keeps track of how much has been
     * parsed.
     */
    private transient int cursor;

    /**
     * Holds the length of the pattern string.
     */
    private transient int patternLength;

組和捕獲

捕獲組可以通過從左到右計算其開括號來編號。

在表達式 ((A)(B(C))) 中，存在四個組：

1	ABC
2	A
3	BC
4	C

組零始終代表整個表達式。

構造器：

    private Pattern(String p, int f) {
        pattern = p;
        flags = f;

        // Reset group index count
        capturingGroupCount = 1;
        localCount = 0;

        if (pattern.length() > 0) {
            compile();
        } else {
            root = new Start(lastAccept);
            matchRoot = lastAccept;
        }
    }

構造器是私有的，可知不能通過new創建Pattern對象。

如何得到Pattern類的實例？

查閱所有方法后發現：

    public static Pattern compile(String regex) {
        return new Pattern(regex, 0);
    }

    public static Pattern compile(String regex, int flags) {
        return new Pattern(regex, flags);
    }

可知是通過Pattern調用靜態方法compile返回Pattern實例。

其他部分方法：

1、public Matcher matcher(CharSequence input)

創建匹配給定輸入與此模式的匹配器，返回此模式的新匹配器。

    public Matcher matcher(CharSequence input) {
    if (!compiled) {
        synchronized(this) {
        if (!compiled)
            compile();
        }
    }
        Matcher m = new Matcher(this, input);
        return m;
    }

2、public static boolean matches(String regex,CharSequence input)

編譯給定正則表達式並嘗試將給定輸入與其匹配。

    public static boolean matches(String regex, CharSequence input) {
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        return m.matches();
    }

測試：

代碼1（參考JDK API 1.6例子）：

        Pattern p = Pattern.compile("a*b");
        Matcher m = p.matcher("aaaaab");
        boolean b = m.matches();
        System.out.println(b);// true

代碼2：

        System.out.println(Pattern.matches("a*b", "aaaaab"));// true

查閱matcher和matches方法可知matches自動做了一些處理，代碼2可視為代碼1的簡化，他們是等效的。

如果要多次使用一種模式，編譯一次后重用此模式比每次都調用此方法效率更高。

3、public String[] split(CharSequence input) 和 public String[] split(CharSequence input, int limit)

input：要拆分的字符序列；

limit：結果閾值；

根據指定模式拆分輸入序列。

limit參數作用：

limit參數控制應用模式的次數，從而影響結果數組的長度。

如果 n 大於零，那么模式至多應用 n- 1 次，數組的長度不大於 n，並且數組的最后條目將包含除最后的匹配定界符之外的所有輸入。

如果 n 非正，那么將應用模式的次數不受限制，並且數組可以為任意長度。

如果 n 為零，那么應用模式的次數不受限制，數組可以為任意長度，並且將丟棄尾部空字符串。

查看split(CharSequence input) 源碼：

    public String[] split(CharSequence input) {
        return split(input, 0);
    }

可知split(CharSequence input)實際調用了split(CharSequence input, int limit)；以下只討論split(CharSequence input, int limit)。

假設：

若input="boo:and:foo"，匹配符為"o"，可知模式最多可應用4次，數組的長度最大為5；

1、當limit=-2時，應用模式的次數不受限制且數組可以為任意長度；推測模式應用4次，數組的長度為5，數組為{"b","",":and:f","",""}；

2、當limit=2時，模式至多應用1次，數組的長度不大於 2，且第二個元素包含除最后的匹配定界符之外的所有輸入；推測模式應用1次，數組的長度為2，數組為{"b","o:and:foo"}；

3、當limit=7時，模式至多應用6次，數組的長度不大於 7；推測模式應用4次，數組的長度為5，數組為{"b","",":and:f","",""}；

4、當limit=0時，應用模式的次數不受限制，數組可以為任意長度，並且將丟棄尾部空字符串；推測模式應用4次，數組的長度為3，數組為{"b","",":and:f"}。

代碼驗證：

public static void main(String[] args) {
        String[] arr = null;
        CharSequence input = "boo:and:foo";
        Pattern p = Pattern.compile("o");
        arr = p.split(input, -2);
        System.out.println(printArr(arr));// {"b","",":and:f","",""}，共有5個元素
        arr = p.split(input, 2);
        System.out.println(printArr(arr));// {"b","o:and:foo"}，共有2個元素
        arr = p.split(input, 7);
        System.out.println(printArr(arr));// {"b","",":and:f","",""}，共有5個元素
        arr = p.split(input, 0);
        System.out.println(printArr(arr));// {"b","",":and:f"}，共有3個元素
    }

    // 打印String數組
    public static String printArr(String[] arr) {
        int length = arr.length;
        StringBuffer sb = new StringBuffer();
        sb.append("{");
        for (int i = 0; i < length; i++) {
            sb.append("\"").append(arr[i]).append("\"");
            if (i != length - 1)
                sb.append(",");
        }
        sb.append("}").append("，共有" + length + "個元素");
        return sb.toString();
    }

輸出結果與以上猜測結果一致。

4、toString()和pattern()

兩個方法代碼一樣，都是返回此模式的字符串表示形式。

   public String toString() {
        return pattern;
    }

    public String pattern() {
        return pattern;
    }

測試：

Pattern p = Pattern.compile("\\d+");
System.out.println(p.toString());// 輸出\d+
System.out.println(p.pattern());// 輸出\d+

5、public int flags()

返回此模式的匹配標志。

    public int flags() {
        return flags;
    }

測試:

Pattern p = Pattern.compile("a+", Pattern.CASE_INSENSITIVE);
System.out.println(p.flags());// 2

查閱Pattern源代碼：

public static final int CASE_INSENSITIVE = 0x02;

可知CASE_INSENSITIVE =2；所以測試輸出2。

更多與正則表達式相關內容：

java正則規則表

java正則表達式之Greedy、Reluctant和Possessive

java之Matcher類詳解

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java API —— Pattern類 java的Pattern類 java Pattern和Matcher詳解（一）Java Pattern類----java正則 JAVA正則表達式：Pattern類與Matcher類詳解(轉) JAVA正則表達式：Pattern類與Matcher類詳解(轉) JAVA正則表達式：Pattern類與Matcher類詳解(轉) Java學習筆記之Pattern類的用法詳解(正則表達式) Java學習筆記之Pattern類的用法詳解(正則表達式) Pattern類（java JDK源碼記錄）