非確定有限狀態自動機的構建（一）——NFA的定義和實現

本文轉載自查看原文 2013-07-17 21:21 4620

保留版權，轉載需注明出處（http://blog.csdn.net/panjunbiao）。

非確定有限狀態自動機（Nondeterministic Finite Automata，NFA）由以下元素組成：

一個有限的狀態集合S
一個輸入符號集合Sigma，並且架設空字符epsilon不屬於Sigma
一個狀態遷移函數，對於所給的每一個狀態和每一個屬於Sigma或{epsilon}的符號，輸出遷移狀態的集合。
一個S中的狀態s0作為開始狀態（初始狀態）
S的一個子集F，作為接受狀態（結束狀態）

例如，我們給定：

S＝{s0, s1, s2, s3, s4}
Sigma={a, b}
狀態遷移函數T，且T(s0, a} = {s1}, T(s1, a) = {s2}, T(s2, b) = {s3}, T(s3, b) = {s4}
s0為開始狀態
{s4}為接受狀態

這樣我們就得到一個很簡單的NFA，它可以用圖來表示，如下圖圖1：

NFA是一個識別器，例如圖1所示的NFA，我們從狀態s0開始，按順序輸入aabb，在輸入第一個符號a之后，狀態將從s0遷移到s1，輸入第二個符號a之后，狀態遷移到s2，輸入第三個符號b之后，狀態遷移到s3，輸入第四個符號b之后，狀態遷移到s4，而s4是接收狀態，也就是說對我們剛才輸入的aabb字符串說yes，表明本NFA識別了所輸入的字符串。

所謂非確定，是指在某個狀態輸入同一個符號，狀態可以遷移到不同的下一個狀態，例如圖2，在s0處輸入字符a，狀態既可以遷移為s1，也可以遷移為s3，准確的說是狀態遷移到了{s1,s3}，因此圖2所示的NFA能夠接受的字符串包括aa和ab。

另外，NFA的特點還在於空符號也能進行狀態遷移，例如圖3的s0，不需要任何輸入字符就可以遷移到s1，因此圖3的NFA可以識別的語言為*a*b，即0到任意多個a，接着0到任意多個b。

NFA可以識別的語言與正則表達式所表達的語言是等價的，參考 http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton

那么，NFA如何實現呢？我們先來看看NFA狀態節點的一種實現：

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

package automata;

import java.util.*;

public class NFAState implements Comparable<NFAState> {
    private static int COUNT = 0;

    //狀態標識，每個NFA狀態節點都有唯一的數值標識
    private int id;

    public int getId() { return this.id; }

    //在創建NFA狀態對象的時候，通過靜態變量生成唯一標識
    public NFAState() {
        this.id = COUNT ++;
    }

    //遷移函數，由於遷移函數需要兩個輸入：當前狀態和輸入符號，因此在一個狀態對象內部，
    //遷移函數都是針對本對象的，只需要輸入符號就可以了，這里通過Map接口實現遷移函數
    protected Map<Integer, Set<NFAState>> transition = new HashMap<Integer, Set<NFAState>>();
    public Map<Integer, Set<NFAState>> getTransition() { return this.transition; }

    //空字符遷移函數，即從當前節點經過空字符輸入所能夠到達的下一個狀態節點
    protected Set<NFAState> epsilonTransition = new HashSet<NFAState>();
    public Set<NFAState> getEpsilonTransition() { return this.epsilonTransition; }

    //向遷移函數添加一個映射，不給定下一個狀態節點
    public NFAState addTransit(int input) {
        return addTransit(input, new NFAState());
    }

    //向遷移函數添加一個映射，給定下一個狀態節點
    public NFAState addTransit(int input, NFAState next) {
        Set<NFAState> states = this.transition.get(input);
        if (states == null) {
            states = new HashSet<NFAState>();
            this.transition.put(input, states);
        }
        states.add(next);
        return next;
    }

    //向遷移函數添加一個映射，不給定下一個狀態節點
    public NFAState addTransit(char input) {
        return addTransit(input, new NFAState());
    }

    //向遷移函數添加一個映射，給定下一個狀態節點
    //假定我們的上下文無關文法是大小寫不敏感的，當輸入字符是char類型並且是字母時，
    //生成大寫字母和小寫字母兩個映射
    public NFAState addTransit(char input, NFAState next) {
        if (Character.isLetter(input)) {
            this.addTransit((int) (Character.toUpperCase(input)), next);
            this.addTransit((int)(Character.toLowerCase(input)), next);
            return next;
        }
        this.addTransit((int)input, next);
        return next;
    }

    //添加一個空字符的映射
    public NFAState addTransit(NFAState next) {
        this.epsilonTransition.add(next);
        return next;
    }

    //返回遷移函數
    public Set<NFAState> getTransition(int input) {
        return this.transition.get(input);
    }

}

再來看看NFA的實現：

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

package automata;

import java.util.*;

import abnf.CharVal;
import abnf.NumVal;
import abnf.AbnfParser;
import abnf.RangedNumVal;
import abnf.Repeat;
import abnf.Repetition;
import abnf.Rule;
import abnf.RuleName;

public class NFA {
    //開始狀態startState
    private NFAState startState = null;
    public NFAState getStartState() { return startState; }

    //接收狀態acceptingStates
    private Set<NFAState> acceptingStates = new HashSet<NFAState>();
    public Set<NFAState> getAcceptingStates() { return acceptingStates; }
    public boolean accept(NFAState state) {
        return this.acceptingStates.contains(state);
    }
    public void addAcceptingState(NFAState state) {
        this.acceptingStates.add(state);
    }

    public NFA() {
        this(new NFAState(), new NFAState());
    }

    public NFA(NFAState startState) {
        this(startState, new NFAState());
    }

    public NFA(NFAState startState, NFAState acceptingState) {
        this.startState = startState;
        this.addAcceptingState(acceptingState);
    }

    //在上面的NFAState類實現中，新的狀態節點是在添加遷移映射的過程中生成的，
    //這個過程中NFA並沒有介入，因此NFA類不能直接得到狀態集S的成員
    //而是需要從狀態startState開始，不斷迭代找出所有的狀態節點
    protected void getStateSet(NFAState current, Set<NFAState> states) {
        if (states.contains(current)) return;
        states.add(current);

        Iterator<NFAState> it;

        it = current.getNextStates().iterator();
        while (it.hasNext()) {
            this.getStateSet(it.next(), states);
        }

        it = current.getEpsilonTransition().iterator();
        while (it.hasNext()) {
            this.getStateSet(it.next(), states);
        }

    }

    public Set<NFAState> getStateSet() {
        Set<NFAState> states = new HashSet<NFAState>();
        this.getStateSet(this.getStartState(), states);
        return states;
    }

}

這樣，我們可以從NFA類中獲得一個NFA的開始狀態startState和接受狀態集合acceptingStates，在每一個狀態節點NFAState中可以獲得狀態遷移函數，因此NFA所定義的各個元素都實現了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。