非确定有限状态自动机的构建（一）——NFA的定义和实现

本文转载自查看原文 2013-07-17 21:21 4620

保留版权，转载需注明出处（http://blog.csdn.net/panjunbiao）。

非确定有限状态自动机（Nondeterministic Finite Automata，NFA）由以下元素组成：

一个有限的状态集合S
一个输入符号集合Sigma，并且架设空字符epsilon不属于Sigma
一个状态迁移函数，对于所给的每一个状态和每一个属于Sigma或{epsilon}的符号，输出迁移状态的集合。
一个S中的状态s0作为开始状态（初始状态）
S的一个子集F，作为接受状态（结束状态）

例如，我们给定：

S＝{s0, s1, s2, s3, s4}
Sigma={a, b}
状态迁移函数T，且T(s0, a} = {s1}, T(s1, a) = {s2}, T(s2, b) = {s3}, T(s3, b) = {s4}
s0为开始状态
{s4}为接受状态

这样我们就得到一个很简单的NFA，它可以用图来表示，如下图图1：

NFA是一个识别器，例如图1所示的NFA，我们从状态s0开始，按顺序输入aabb，在输入第一个符号a之后，状态将从s0迁移到s1，输入第二个符号a之后，状态迁移到s2，输入第三个符号b之后，状态迁移到s3，输入第四个符号b之后，状态迁移到s4，而s4是接收状态，也就是说对我们刚才输入的aabb字符串说yes，表明本NFA识别了所输入的字符串。

所谓非确定，是指在某个状态输入同一个符号，状态可以迁移到不同的下一个状态，例如图2，在s0处输入字符a，状态既可以迁移为s1，也可以迁移为s3，准确的说是状态迁移到了{s1,s3}，因此图2所示的NFA能够接受的字符串包括aa和ab。

另外，NFA的特点还在于空符号也能进行状态迁移，例如图3的s0，不需要任何输入字符就可以迁移到s1，因此图3的NFA可以识别的语言为*a*b，即0到任意多个a，接着0到任意多个b。

NFA可以识别的语言与正则表达式所表达的语言是等价的，参考 http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton

那么，NFA如何实现呢？我们先来看看NFA状态节点的一种实现：

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

package automata;

import java.util.*;

public class NFAState implements Comparable<NFAState> {
    private static int COUNT = 0;

    //状态标识，每个NFA状态节点都有唯一的数值标识
    private int id;

    public int getId() { return this.id; }

    //在创建NFA状态对象的时候，通过静态变量生成唯一标识
    public NFAState() {
        this.id = COUNT ++;
    }

    //迁移函数，由于迁移函数需要两个输入：当前状态和输入符号，因此在一个状态对象内部，
    //迁移函数都是针对本对象的，只需要输入符号就可以了，这里通过Map接口实现迁移函数
    protected Map<Integer, Set<NFAState>> transition = new HashMap<Integer, Set<NFAState>>();
    public Map<Integer, Set<NFAState>> getTransition() { return this.transition; }

    //空字符迁移函数，即从当前节点经过空字符输入所能够到达的下一个状态节点
    protected Set<NFAState> epsilonTransition = new HashSet<NFAState>();
    public Set<NFAState> getEpsilonTransition() { return this.epsilonTransition; }

    //向迁移函数添加一个映射，不给定下一个状态节点
    public NFAState addTransit(int input) {
        return addTransit(input, new NFAState());
    }

    //向迁移函数添加一个映射，给定下一个状态节点
    public NFAState addTransit(int input, NFAState next) {
        Set<NFAState> states = this.transition.get(input);
        if (states == null) {
            states = new HashSet<NFAState>();
            this.transition.put(input, states);
        }
        states.add(next);
        return next;
    }

    //向迁移函数添加一个映射，不给定下一个状态节点
    public NFAState addTransit(char input) {
        return addTransit(input, new NFAState());
    }

    //向迁移函数添加一个映射，给定下一个状态节点
    //假定我们的上下文无关文法是大小写不敏感的，当输入字符是char类型并且是字母时，
    //生成大写字母和小写字母两个映射
    public NFAState addTransit(char input, NFAState next) {
        if (Character.isLetter(input)) {
            this.addTransit((int) (Character.toUpperCase(input)), next);
            this.addTransit((int)(Character.toLowerCase(input)), next);
            return next;
        }
        this.addTransit((int)input, next);
        return next;
    }

    //添加一个空字符的映射
    public NFAState addTransit(NFAState next) {
        this.epsilonTransition.add(next);
        return next;
    }

    //返回迁移函数
    public Set<NFAState> getTransition(int input) {
        return this.transition.get(input);
    }

}

再来看看NFA的实现：

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

package automata;

import java.util.*;

import abnf.CharVal;
import abnf.NumVal;
import abnf.AbnfParser;
import abnf.RangedNumVal;
import abnf.Repeat;
import abnf.Repetition;
import abnf.Rule;
import abnf.RuleName;

public class NFA {
    //开始状态startState
    private NFAState startState = null;
    public NFAState getStartState() { return startState; }

    //接收状态acceptingStates
    private Set<NFAState> acceptingStates = new HashSet<NFAState>();
    public Set<NFAState> getAcceptingStates() { return acceptingStates; }
    public boolean accept(NFAState state) {
        return this.acceptingStates.contains(state);
    }
    public void addAcceptingState(NFAState state) {
        this.acceptingStates.add(state);
    }

    public NFA() {
        this(new NFAState(), new NFAState());
    }

    public NFA(NFAState startState) {
        this(startState, new NFAState());
    }

    public NFA(NFAState startState, NFAState acceptingState) {
        this.startState = startState;
        this.addAcceptingState(acceptingState);
    }

    //在上面的NFAState类实现中，新的状态节点是在添加迁移映射的过程中生成的，
    //这个过程中NFA并没有介入，因此NFA类不能直接得到状态集S的成员
    //而是需要从状态startState开始，不断迭代找出所有的状态节点
    protected void getStateSet(NFAState current, Set<NFAState> states) {
        if (states.contains(current)) return;
        states.add(current);

        Iterator<NFAState> it;

        it = current.getNextStates().iterator();
        while (it.hasNext()) {
            this.getStateSet(it.next(), states);
        }

        it = current.getEpsilonTransition().iterator();
        while (it.hasNext()) {
            this.getStateSet(it.next(), states);
        }

    }

    public Set<NFAState> getStateSet() {
        Set<NFAState> states = new HashSet<NFAState>();
        this.getStateSet(this.getStartState(), states);
        return states;
    }

}

这样，我们可以从NFA类中获得一个NFA的开始状态startState和接受状态集合acceptingStates，在每一个状态节点NFAState中可以获得状态迁移函数，因此NFA所定义的各个元素都实现了。

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。