一個只能匹配非常簡單的(字母 . + *)共 4 種狀態的正則表達式語法的自動機(注意,僅限 DFA,沒考慮 NFA):
好久之前寫的了,記得有個 bug 一直沒解決...
#include <iostream>
//#include <fstream>
#include <vector>
#include <string>
class DFA
{
void construction(std::string regex)
{
std::vector<AM*> worker;
Match = std::make_unique<AM>();
h = std::make_unique<AM>(toRange('H'), nullptr);
AM* h_ptr = h.get();
for (auto iter = regex.begin(); iter != regex.end(); ++iter)
{
AM * temp = new AM(toRange(*iter), h_ptr);
switch (*iter)
{
case '.':
{
h_ptr->next[temp->ch] = temp;
h_ptr = temp;
if (iter + 1 != regex.end() && *(iter + 1) != '*')
{
while (!worker.empty())
{
AM*c_ptr = worker.front();
worker.erase(worker.begin());
c_ptr->next[h_ptr->ch] = h_ptr;
}
}
}
break;
case '*':
{
h_ptr->next[h_ptr->ch] = h_ptr;
for (std::vector<AM*>::iterator i = worker.begin(); i != worker.end(); i++)
(*i)->next[h_ptr->ch] = h_ptr;
if (h_ptr->prev != nullptr)
worker.push_back(h_ptr->prev);
worker.push_back(h_ptr);
delete temp;
temp = nullptr;
}
break;
case '+':
{
h_ptr->next[h_ptr->ch] = h_ptr;
while (!worker.empty())
{
AM*c_ptr = worker.front();
worker.erase(worker.begin());
c_ptr->next[h_ptr->ch] = h_ptr;
}
delete temp;
temp = nullptr;
}
break;
default:
{
h_ptr->next[temp->ch] = temp;
h_ptr = temp;
if (iter + 1 != regex.end() && *(iter + 1) != '*')
{
while (!worker.empty())
{
AM*c_ptr = worker.front();
worker.erase(worker.begin());
c_ptr->next[h_ptr->ch] = h_ptr;
}
}
}
break;
}
}
while (!worker.empty())
{
AM*c_ptr = worker.front();
worker.erase(worker.begin());
if (h_ptr->next[h_ptr->ch] == h_ptr)
c_ptr->next[0] = Match.get();
else
c_ptr->next[h_ptr->ch] = h_ptr;
}
h_ptr->next[0] = Match.get();
}
char toRange(char c) const
{
if (c == '.')
return 27;
return c - 'a' + 1;
}
public:
bool isMatch(std::string s, std::string regex)
{
construction(regex);
AM * am = h.release();
for (auto i:s)
{
char c = toRange(i);
if (am == nullptr)
return false;
if (am->next[c] != nullptr)
am = am->next[c];
else if (am->next[27] != nullptr)
am = am->next[27];
else
am = am->next[c];
}
return am != nullptr && am->next[0] == Match.get();
}
private:
struct AM {
char ch;
AM *prev, *next[28];
AM() : ch(), prev(), next() {}
AM(char v, AM * prev) : ch(v), prev(prev), next() {}
};
std::unique_ptr<AM> Match, h;
};
int main(int argc, char const *argv[])
{
DFA s;
std::cout << (s.isMatch("abc", "aa*b*c+p*") ? "true":"false");
return 0;
}
示例1:a*b*c+d*
該正則表達式的DFA如下圖

示例2:(a|b)*a

這是一個NFA,我的代碼並沒有實現NFA轉DFA,因而會導致匹配失敗。(2020-04-05 21:00:39 補充:所以 leetcode 上 a*a 過不了,因為它也是NFA。雖然可以轉換為正則表達式 a+ 來匹配,它的構造圖:

但 a+ 的 DFA 構造圖也可以是這樣的:

實際上我們寫出狀態轉移表,然后直接查狀態表效率會更高,不過我個人覺得模擬匹配的過程更有意思。
