從零寫一個編譯器（四）：語法分析之構造有限狀態自動機

本文轉載自查看原文 2019-08-17 09:46 404 從零寫一個編譯器/ JavaScript/ 計算機科學/ 開發日記/ 編譯原理

項目的完整代碼在 C2j-Compiler

通過上一篇對幾個構造自動機的基礎數據結構的描述，現在就可以正式來構造有限狀態自動機

我們先用一個小一點的語法推導式來描述這個過程

s -> e
e -> e + t
e -> t
t -> t * f
t -> f
f -> ( e )
f -> NUM

初始化

狀態0是狀態機的初始狀態，它包含着語法表達式中的起始表達式，也就是編號為0的表達式：

0: s -> . e

這里的點也就是之前Production類中的dosPos

負責這個操作的方法在StateNodeManager類中，前面先判斷當前目錄下是不是已經構建好語法分析表了，如果有的話就不需要再次構建了。

productionManager.buildFirstSets();可以先略過，后面會講到。

ProductionsStateNode就是用來描述狀態節點的

public static int stateNumCount = 0;
/** Automaton state node number */
public int stateNum;
/** production of state node */
public ArrayList<Production> productions;

接着就是放入開始符號作為第一個狀態節點，也就是這一步的初始化

public void buildTransitionStateMachine() {
    File table = new File("lrStateTable.sb");
    if (table.exists()) {
        return;
    }
    ProductionManager productionManager = ProductionManager.getInstance();
    productionManager.buildFirstSets();
    ProductionsStateNode state = getStateNode(productionManager.getProduction(Token.PROGRAM.ordinal()));

    state.buildTransition();

    debugPrintStateMap();
}

對起始推導式做閉包操作

注意之前的 . ,也就是Production里的dosPos，這一步就有用了，利用這個點來做閉包操作

對.右邊的符號做閉包操作，也就是說如果 . 右邊的符號是一個非終結符，那么肯定有某個表達式，->左邊是該非終結符，把這些表達式添加進來

s -> . e
e -> . e + t
e -> . t

對新添加進來的推導式反復重復這個操作，直到所有推導式->右邊是非終結符的那個所在推導式都引入，這也就是ProductionsStateNode里的makeClosure方法

主要邏輯就是先將這個節點中的所有產生式壓入堆棧中，再反復的做閉包操作。closureSet是每個節點中保存閉包后的產生式

private void makeClosure() {
    Stack<Production> productionStack = new Stack<Production>();
    for (Production production : productions) {
        productionStack.push(production);
    }

    if (Token.isTerminal(production.getDotSymbol())) {
        ConsoleDebugColor.outlnPurple("Symbol after dot is not non-terminal, ignore and process next item");
        continue;
    }
            
    while (!productionStack.empty()) {
        Production production = productionStack.pop();
        int symbol = production.getDotSymbol();
        ArrayList<Production> closures = productionManager.getProduction(symbol);
        for (int i = 0; closures != null && i < closures.size(); i++) {
            if (!closureSet.contains(closures.get(i))) {
                closureSet.add(closures.get(i));
                productionStack.push(closures.get(i));
            }
        }
    }
}

對引入的產生式進行分區

把 . 右邊擁有相同非終結符的表達式划入一個分區，比如

s -> . e
e -> . e + t

就作為同一個分區。最后把每個分區中的表達式中的 . 右移動一位，形成新的狀態節點

s -> e .
e -> e . + t

分區操作就在ProductionsStateNode類中的partition方法中

主要邏輯也很簡單，遍歷當前的closureSet，如果分區不存在，就以產生式點的右邊作為key，產生式列表作為value，並且如果當前產生式列表里不包含這個產生式，就把這個產生式加入當前的產生式列表

private void partition() {
    ConsoleDebugColor.outlnPurple("==== state begin make partition ====");

    for (Production production : closureSet) {
        int symbol = production.getDotSymbol();
        if (symbol == Token.UNKNOWN_TOKEN.ordinal()) {
            continue;
        }

        ArrayList<Production> productionList = partition.get(symbol);
        if (productionList == null) {
            productionList = new ArrayList<>();
            partition.put(production.getDotSymbol(), productionList);
        }

        if (!productionList.contains(production)) {
            productionList.add(production);
        }
    }

    debugPrintPartition();
    ConsoleDebugColor.outlnPurple("==== make partition end ====");
}

對所有分區節點構建跳轉關系

根據每個節點 . 左邊的符號來判斷輸入什么字符來跳入該節點

比如， . 左邊的符號是 t, 所以當狀態機處於狀態0時，輸入時 t 時，跳轉到狀態1。

. 左邊的符號是e, 所以當狀態機處於狀態 0 ，且輸入時符號e時，跳轉到狀態2：
0 – e -> 2

這個操作的實現再ProductionsStateNode的makeTransition方法中

主要邏輯是遍歷所有分區，每個分區都是一個新的節點，所以拿到這個分區的跳轉關系，也就是partition的key，即之前產生式的點的右邊。然后構造一個新的節點和兩個節點之間的關系

private void makeTransition() {
    for (Map.Entry<Integer, ArrayList<Production>> entry : partition.entrySet()) {
        ProductionsStateNode nextState = makeNextStateNode(entry.getKey());

        transition.put(entry.getKey(), nextState);

        stateNodeManager.addTransition(this, nextState, entry.getKey());
    }

    debugPrintTransition();

    extendFollowingTransition();
}

makeNextStateNode的邏輯也很簡單，就是拿到這個分區的產生式列表，然后返回一個新節點

private ProductionsStateNode makeNextStateNode(int left) {
    ArrayList<Production> productions = partition.get(left);
    ArrayList<Production> newProductions = new ArrayList<>();

    for (int i = 0; i < productions.size(); i++) {
        Production production = productions.get(i);
        newProductions.add(production.dotForward());
    }

    return stateNodeManager.getStateNode(newProductions);
}

stateNodeManager已經出現很多次了，它是類StateNodeManager，它的作用是管理節點，分配節點，統一節點。之后對節點的壓縮和語法分析表的最終構建都在這里完成，這是后話了。

上面用到的兩個方法：

transitionMap相當於一個跳轉表：key是起始節點，value是一個map，這個map的key是跳轉關系，也就是輸入一個終結符或者非終結符，value則是目標節點

public void addTransition(ProductionsStateNode from, ProductionsStateNode to, int on) {
        HashMap<Integer, ProductionsStateNode> map = transitionMap.get(from);
        if (map == null) {
            map = new HashMap<>();
        }

        map.put(on, to);
        transitionMap.put(from, map);
}

getStateNode先從判斷如果這個節點沒有創建過，創建過的節點都會加入stateList中，就創建一個新節點。如果存在就會返回這個原節點

public ProductionsStateNode getStateNode(ArrayList<Production> productions) {
    ProductionsStateNode node = new ProductionsStateNode(productions);

    if (!stateList.contains(node)) {
        stateList.add(node);
        ProductionsStateNode.increaseStateNum();
        return node;
    }

    for (ProductionsStateNode sn : stateList) {
        if (sn.equals(node)) {
            node = sn;
        }
    }

    return node;
}

對所有新生成的節點重復構建

這時候的第一輪新節點才剛剛完成，到等到所有節點都完成節點的構建才算是真正的完成，在makeTransition中調用的extendFollowingTransition正是這個作用

private void extendFollowingTransition() {
    for (Map.Entry<Integer, ProductionsStateNode> entry : transition.entrySet()) {
        ProductionsStateNode state = entry.getValue();
        if (!state.isTransitionDone()) {
            state.buildTransition();
        }
    }
}

小結

創建有限狀態自動機的四個步驟

makeClosure
partition
makeTransition
最后重復這些步驟直到所有的節點都構建完畢

至此我們對

public void buildTransition() {
    if (transitionDone) {
        return;
    }
    transitionDone = true;

    makeClosure();
    partition();
    makeTransition();
}

的四個過程都已經完成，自動機的構建也算完成，應該進行語法分析表的創建了，但是這個自動機還有些問題，下一篇會來改善它。

另外我的github博客：https://dejavudwh.cn/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 從零寫一個編譯器（五）：語法分析之自動機的缺陷和改進有限狀態自動機有限狀態自動機 DFA確定有限狀態自動機編譯原理DFA（有限確定自動機）的構造非確定有限狀態自動機的構建（一）——NFA的定義和實現基於有限狀態自動機的數據類型識別功能（1）字符串匹配——KMP與有限狀態自動機簡聊DFA（確定性有限狀態自動機）用C語言實現有限狀態自動機FSM