【hyperscan】示例解讀 simplegrep

本文轉載自查看原文 2015-10-23 13:35 7915 DPDK/ hyperscan

示例位置: <hyperscan source>/examples/simplegrep.c
參考：http://01org.github.io/hyperscan/dev-reference/api_files.html

1. 概述

此示例實現一個grep的簡化版本：指定一個正則表達式和文件，執行后依次輸出匹配位置。

但這個簡單示例並不支持從stdin讀取數據，也不支持grep那豐富的命令行參數。

simplegrep演示了以下hyperscan概念：

單個模式的編譯
使用最簡單的hs_compile接口，僅支持一個正則表達式。支持多個表達式同時編譯的API是hs_compile_multi
Block方式的模式匹配
在單個數據塊上進行搜索匹配；更復雜的是在流(stream)上進行匹配，它可以跨數據塊進行模式匹配
臨時數據(scratch)的分配與使用
hyperscan在匹配時需要一塊臨時數據（記為D），調用者需要保證在同一時刻只有一個hs_scan接口使用同一D，但並不要求連續的hs_can調用必須使用同一個D。由於D的分配代價昂貴，為了性能考慮，用戶最好在運行前就分配好D並在運行時重用它。

2. 源碼解讀

這個示例非常簡單，這里只解讀表達式編譯和匹配兩部分的代碼，讀取數據文件等代碼忽略。

2.1 編譯正則表達式(compile)

進行匹配之前，首先需要編譯正則表達式，生成hs_database_t。

    hs_database_t *database;
    hs_compile_error_t *compile_err;
    if (hs_compile(pattern, HS_FLAG_DOTALL, HS_MODE_BLOCK, NULL, &database, &compile_err) != HS_SUCCESS) {
        fprintf(stderr, "ERROR: Unable to compile pattern \"%s\": %s\n",
                pattern, compile_err->message);
        hs_free_compile_error(compile_err);
        return -1;
    }

hs_compile的原型是

hs_error_t hs_compile(const char * expression, 
                      unsigned int flags, 
                      unsigned int mode, 
                      const hs_platform_info_t * platform, 
                      hs_database_t ** db, 
                      hs_compile_error_t ** error)

其中，expression是正則表達式字符串；flags用來控制正則的行為，比如忽略大小寫，使.包含換行等；mode確定了生成database的格式，主要有BLOCK，STREAM和VECTOR三種，每一種模式的database只能由相應的scan接口使用；platform用來指定此database的目標平台（主要是一些CPU特性），為NULL表示目標平台與當前平台一致；db用來保存編譯后的database；error接收錯誤信息。

2.2 進行匹配(scan)

首先分配好每次匹配需要用的臨時數據(scratch)。

hs_scratch_t *scratch = NULL;
    if (hs_alloc_scratch(database, &scratch) != HS_SUCCESS) {
        fprintf(stderr, "ERROR: Unable to allocate scratch space. Exiting.\n");
        free(inputData);
        hs_free_database(database);
        return -1;
    }

接下來進行匹配(scan）。

if (hs_scan(database, inputData, length, 0, scratch, eventHandler, pattern) != HS_SUCCESS) {
        fprintf(stderr, "ERROR: Unable to scan input buffer. Exiting.\n");
        hs_free_scratch(scratch);
        free(inputData);
        hs_free_database(database);
        return -1;
    }

hs_scan的原型是

hs_error_t hs_scan(const hs_database_t * db, 
                   const char * data, 
                   unsigned int length, 
                   unsigned int flags, 
                   hs_scratch_t * scratch, 
                   match_event_handler onEvent, 
                   void * context)

其中，db就是上一步編譯的databas；data和length分別是要匹配的數據和數據長度；flags用來在未來版本中控制函數行為，目前未使用；scratch是匹配時要用的臨時數據，之前已經分配好；onEvent非常關鍵，即匹配時調用的回調函數，由用戶指定；context是用戶自定義指針。

匹配回調函數的原型是

typedef (* match_event_handler)(unsigned int id, 
                                unsigned long long from, 
                                unsigned long long to, 
                                unsigned int flags, 
                                void *context)

其中，id是命中的正則表達式的ID，對於使用hs_compile編譯的唯一表達式來說，此值為0；如果在編譯時指定了相關模式選項(hs_compile中的mode參數），則此值將會設為匹配特征的起始位置，否則會設為0；to是命中數據的下一個字節的偏移；flags目前未用；context是用戶自定義指針。

返回值為非0表示停止匹配，否則繼續；在匹配的過程中，每次命中時都將同步調用匹配回調函數，直到匹配結束。

本例中的回調函數是

static int eventHandler(unsigned int id, unsigned long long from,
                        unsigned long long to, unsigned int flags, void *ctx) {
    printf("Match for pattern \"%s\" at offset %llu\n", (char *)ctx, to);
    return 0;
}

輸出了正則表達式和其匹配的位置（命中數據的下一個字節在數據中的偏移值）。

2.3 清理資源

程序結束后，應清理相關數據，釋放內存。

 hs_free_scratch(scratch);
    free(inputData);
    hs_free_database(database);

3. 編譯運行

編譯之前，我已經通過make install將hyperscan頭文件和靜態庫安裝在了/usr/local相關目錄中。

gcc -o simplegrep simplegrep.c -lhs -lstdc++ -lm

注意鏈接stdc++和math庫（lstdc++ -lm)。如果是鏈接動態庫，不需要加-lstdc++ -lm。

運行，在另一示例代碼pcapscan.cc中匹配/[f|F]ile/：

./simplegrep '[f|F]ile' pcapscan.cc   
Scanning 22859 bytes with Hyperscan
Match for pattern "[f|F]ile" at offset 1692
.....（略，共45次匹配）

用grep命令驗證結果

grep -o '[f|F]ile' pcapscan.cc | wc -l
45

OK，也是45次。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hyperscan-5.1.0 安裝 ng-ui-router 官方示例的解讀一種hyperscan API使用（1）個人解讀《ABP微服務解決方案示例》 [源碼解析]深度學習利器之自動微分(3) --- 示例解讀 [源碼解讀] ResNet源碼解讀（pytorch） logback.xml解讀----日志配置解讀 FCOS代碼解讀 jetty.xml 解讀 Java回調機制解讀