Java中正則匹配性能測試

本文轉載自查看原文 2013-12-30 22:08 4776 Java

工作中經常會用到在文本中每行檢索某種pattern，剛才測試了三種方式，發現實際性能和預想的有區別

方式1：

直接字符串的matches方法，【string.matches("\\d+")】

方式2：

先構建一個單行匹配的pattern，然后用這個pattern去match

Pattern p1=Pattern.compile("\\d+");

Matcher m=p1.matcher(sar[i]);

方式3：

構建一個可以匹配換行符DOTALL模式的pattern，然后在整個文本中find

Pattern p2=Pattern.compile("\\d+",Pattern.DOTALL );

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class TestRe {

    /**
     * @param args
     */
    public static void main(String[] args) {
        String s1="abc";
        String s2="123";
        //構建一個多行的字符串
        StringBuilder stb=new StringBuilder();
        for(int i=0;i<10000;i++){
            int k=new Random().nextInt()%2;
            if(k==0){
                stb.append(s1+"\n");
            }
            else{
                stb.append(s2+"\n");
            }
        }
        
        Pattern p2=Pattern.compile("\\d+",Pattern.DOTALL );
        Pattern p1=Pattern.compile("\\d+");
        String ts=stb.toString();
        String[] sar=ts.split("\n");
        test1(sar);
        test2(ts,p2);
        test3(sar,p1);
        
        
    
    }

    public static void test1(String[] sar){
        long st=System.nanoTime();
        List<String> l=new ArrayList<String>();
        for(int i=0;i<sar.length;i++){
            if(sar[i].matches("\\d+")){
                l.add(sar[i]);
            }
        }
        System.out.println("1Size"+l.size());
        long et=System.nanoTime();
        System.out.println("test1:"+(et-st)+"納秒");
    }
    
    
    public static void test3(String[] sar,Pattern p1){
        long st=System.nanoTime();
        List<String> l=new ArrayList<String>();
        for(int i=0;i<sar.length;i++){
            Matcher m=p1.matcher(sar[i]);
            if(m.matches()){
                l.add(sar[i]);
            }
        }
        System.out.println("3Size"+l.size());
        long et=System.nanoTime();
        System.out.println("test3:"+(et-st)+"納秒");
    }
    
    public static void test2(String s,Pattern p){
        long st=System.nanoTime();
        List<String> l=new ArrayList<String>();
        Matcher m=p.matcher(s);
        while(m.find()){
            l.add(m.group());
        }
        System.out.println("2Size"+l.size());
        long et=System.nanoTime();
        System.out.println("test2:"+(et-st)+"納秒");
    }
    
}

下

面是運行結果，方法一竟然性能最低，在很簡單的正則表達式或者不需要替換，不需要找出子匹配時，這個方法是我用的最多的，想不到性能最差。

測試一：

1Size4999
test1:53153038納秒
2Size4999
test2:13393716納秒
3Size4999
test3:4527045納秒

測試二：

1Size4941
test1:38807545納秒
2Size4941
test2:6826025納秒
3Size4941
test3:3127127納秒

看起來好像是方法三優於方法二，實則不然，如果把調用次序換下，方法二有會快於方法三，我的猜想是底層調用可能有單例對象什么的。先調用的pattern可能被復用。沒做調查

調換后的結果：

測試一：

2Size5093
test2:12792156納秒
3Size5093
test3:13665544納秒
1Size5093
test1:56139648納秒

測試二：

2Size4952
test2:12030397納秒
3Size4952
test3:20046193納秒
1Size4952
test1:40124475納秒

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Java】Java-正則匹配-性能優化 java中正則表達式匹配ip地址的寫法 Java中正則表達式(regex)匹配多行(Pattern.MULTILINE和Pattern.DOTALL模式) Javascript中正則表達式的全局匹配模式 Python中正則匹配使用findall時的注意事項 java中正則表達式要進行轉義的字符。 java中正則表達式基本用法 java中正則表達式的group用法 Java中正則表達式去除html標簽關於php中正則匹配包括換行符在內的任意字符的問題總結