LR(1)語法分析器生成器(生成Action表和Goto表)java實現(二)

本文轉載自查看原文 2019-08-10 13:12 909 算法/ Java/ 編譯器/ 小項目/ BNF/ 編譯原理/ LR(1)
　　updata : 附我之前bilibili講解視頻鏈接 : https://www.bilibili.com/video/av63666423?share_medium=android&share_source=qq&bbid=PQ0-BzIEPAU2VGNXK1crinfoc&ts=1565782566880
　　目前完成進度 : 目前已經完成了表驅動,通過函數輸出這個Action 和 Goto表。然后使用者就可以根據兩個表來進行LR(1)語法分析。且經過比對,發現和書上的例子(括號語法)是完全吻合的。
   1 package cn.vizdl.LR1.version3;
   2 
   3 import java.util.ArrayList;
   4 import java.util.HashMap;
   5 import java.util.HashSet;
   6 import java.util.List;
   7 import java.util.Scanner;
   8 import java.util.Set;
   9 
  10 /*
  11 項目名 :  LR(1) parser generator (LR(1)語法分析器生成器)
  12 項目分析 : 
  13     輸入 : 輸入某文件內存地址。且內部采用  <Expr> ::=  <Term> + <Factor>; 的結構輸入的LR(1)語法。
  14     這里的僅支持 BNF范式內部的 終結符,非終結符,或運算,;(表示產生式結束),::=(表示為定義為)
  15     在這里不支持閉包,也就是{},因為閉包可以轉換為非終結符的遞歸。
  16         輸入文本格式 : 要求輸入語法為 無回溯語法,在前瞻一個符號的情況下,總能預測正確的產生式規則。
  17             start : <aim_name>  //aim_name表示起始符號名稱 
  18         例子 : 
  19         //這是錯誤的例子,不符合LR(1)語法
  20             start : <Goal>;
  21             <Goal> ::= <Expr>;
  22             <Expr> ::= <Expr> "+" <Term> | <Expr> "-" <Term>;
  23             <Term> ::= <Term> "*" <Factor> | <Term> "/" <Factor>;
  24             <Factor> ::= "number";
  25             #
  26             
  27             
  28             start : <Goal>;
  29             <Goal> ::= <Expr>;
  30             <Expr> ::= <Term><Expr'>;
  31             <Expr'> ::= "+" <Term><Expr'> 
  32                     |    "-" <Term><Expr'> 
  33                     |    "ε";
  34             <Term> ::= <Factor><Term'>;
  35             <Term'> ::= "*" <Factor><Term'>
  36                     |    "/" <Factor><Term'>
  37                     |    "ε";
  38             <Factor> ::= "("<Expr>")"
  39                     |    "num"
  40                     |    "name";
  41             #
  42             
  43             
  44             start : <Goal>;
  45             <Goal> ::= <List>;
  46             <List> ::= <List><Pair>
  47                     |    <Pair>;
  48             <Pair> ::= "(" <Pair> ")"
  49                     |    "("")";
  50             #
  51         以#作為結尾
  52     輸入分析 : 因為上下文無關語法是一個四元組,而LR(1)語法又是上下文無關語法的子集。所以采用四元組的形式來表示LR(1)語法,是不會損失信息的。
  53         四元組 (T,NT,S,P)
  54         T ： 終結符集合
  55         NT : 非終結符集合
  56         S : 語法的起始符號(非終結符)
  57         P : 產生式集合 
  58         T, NT都可以用一個hash_set來表示。
  59         P 可以分為兩個部分,左側一定是一個非終結符,右側是一個支持或運算的產生式。 
  60         產生式左端可以使用Node節點來表示,產生式右端可以使用多個鏈表(具體有幾個取決於當前產生式有多少個或運算符)來表示。
  61         將當下語法分為三級,第一級是Expr,第二級別是Term,第三個級別是Factor
  62         <Expr> ::= <Term> { "|" <Term>}; //產生式(表達式)可以表達成多個小句子 或 起來
  63         <Term> ::= <Factor> { "+" <Factor>}; // + 表示連接
  64         <Factor> ::= <T> | <NT>
  65     輸出 : Action 和 GoTo 表。 
  66     
  67     1.0完成進度 : 完成了將 輸入字符串 轉換成了 中間數據結構(BnfContainer)表示。
  68     2.0完成進度 : closure()閉包函數
  69     在這里 [A -> β·Cθ,a]指的是,識別完了A后非終結符號為a(也就是LR(1)中的1,前瞻一個符號)。
  70     關於FIRST : 
  71         FIRST(A) : 對於語法符號A,FIRST(A)表示,從A推導出的符號串的第一個單詞所對應的終結符的集合。
  72         FIRST定義域 : T ∪ NT ∪ {ε, eof}
  73         FIRST值域 : T ∪ {ε, eof}
  74         如若A等於T ∪ {ε, eof}
  75         那么 FIRST(A) = A
  76     在閉包函數中,難點在於 FIRST(θa),這不是一個簡單的 FIRST(θ),因為多了一個非終結符號 a。
  77     這是為了防止FIRST(θ)為ε的情況,這樣FIRST(θa)退化為FIRST(a) = {a}
  78     closure(s) : 
  79     while (s is still changing)
  80         for each item [A -> β·Cθ,a] ∈ s        //從當前s集中尋求新狀態
  81             for each production C -> γ ∈ P  //有效產生式
  82                 for each b ∈ FIRST(θa)        //如若是可能是前瞻符號(非終結符)
  83                     s <- s ∪ {[C -> ·γ,b]}
  84                     
  85     //這里x是語法符號,可以是終結符,也可以是非終結符
  86     goto(s, x)
  87     moved <- ∅
  88     for each item ∈ s        //對於s集中的每個項
  89         if the from of i is [A -> β·xθ,a] then
  90             moved <- moved ∪ {[A -> βx·θ,a]}
  91     return closure (moved)
  92     
  93     構建CC的算法:
  94     CC0 <- closure({[S' -> ·S,eof]}) //構建初始狀態集
  95     CC <- {CC0}        //將初始狀態集添加到規范族CC中
  96     while (new set are still being added to CC)        
  97         for each unmarked set CCi ∈ CC    //unmarked : 未標記的
  98             mark CCi as processed    //將CCi標記為處理過的
  99             for each x following a · in an item in CCi  //對於CCi中項a ·后面的每個x
 100                 temp <- goto(CCi, x)
 101                 if temp ∉ CC
 102                     then CC <- CC ∪ {temp}
 103                 record transition from CCi to temp on x
 104     例子 : 
 105     <Goal> ::= <List>;
 106     <List> ::= <List> <Pair> | <Pair>;
 107     <Pair> ::= "(" <Pair> ")" | "(" ")";
 108     closure({[Goal -> ·List,eof]})
 109     
 110     理解 : 將整個BNF范式語句全都替換成非終結符,結果可能會有很多個。
 111     但是這可以組成一個DFA,但是許多項都表示的其實是同一種狀態,所以需要
 112     closure來將這些狀態來並到同一個集合內,而goto則是從某個狀態集接各種符號
 113     ,轉移到一個新的狀態集,這里即可以是終結符,也可以是非終結符。但是轉移后不一定
 114     能包含所有的這一狀態下的項,所以仍需要閉包運算來完善狀態集。
 115     
 116     如何表示一個項？
 117     一個項包含三個元素,第一是產生式,第二是 · ,第三是前瞻符號。
 118     這可以用三個數字來表示。可以使用一個NODE來表示,
 119     但是這樣好像就用不了set,來篩選是否重合了。
 120     字符串表示法？
 121     使用字符串,並且兩個中隔符號隔開三個數據。
 122     如若需要,再從字符串轉換為數字。
 123     
 124     3.0版本完成進度 :
 125     填表算法 : 
 126     Action表  縱軸是狀態,橫軸是 前瞻符號(終結符),內容是規約,狀態轉移,接收以及失敗。
 127     Goto表  縱軸是狀態,橫軸是 前瞻符號(非終結符),當進行規約操作后,可以依靠棧中之前的狀態,
 128     加上前瞻的非終結符,來進行狀態轉移。
 129     
 130     for each CCi ∈ CC
 131         for each  item I ∈ CCi
 132             if I is [A -> β·cθ,a] and goto (CCi , c) = CCj then
 133                 Action[i,c] <- "shift j"
 134             else if I is [A -> β·,a] then    //規約
 135                 Action[i,a] <- "reduce A->B"
 136             else if I is [S'->S·,eof] then    //如若是目標項產生式推導完成狀態並且前瞻符號為eof,則為接收狀態。
 137                 Action[i,eof] <- "accept"
 138         for each n ∈ NT                //如若項集CCi跳過一個非終結符n即到達j
 139             if goto(CCi, n) = CCj then
 140                 Goto[i,n] <- j
 141                 
 142     如何表示幾種狀態?
 143     可以使用位來保證兩種狀態不混合, shift j直接填入j,而reduce A -> B則或上整型最高位。
 144 這個圖的狀態 對應 書上狀態
 145 0 - 0
 146 1 - 1
 147 6 - 6
 148 這里r是產生式下標...,而不是表達式下標...。
 149 因為我們采用的結構是產生式 -> 表達式,也就是一個產生式連接多個表達式。
 150 
 151 本圖 對應 書上圖(狀態)
 152 0 - 0
 153 1 - 1
 154 6 - 6
 155 4 - 4
 156 11 - 11
 157 9 - 9
 158 2 - 3
 159 3 - 2
 160 5 - 7
 161 7 - 5
 162 10 - 8
 163 8 - 10
 164 Action表如下
 165     eof    (    )    
 166 0    err    s2    err    
 167 1    acc    s2    err    
 168 2    err    s6    s5    
 169 3    r3    r3    err    
 170 4    r2    r2    err    
 171 5    r5    r5    err    
 172 6    err    s6    s8    
 173 7    err    err    s10    
 174 8    err    err    r5    
 175 9    err    err    s11    
 176 10    r4    r4    err    
 177 11    err    err    r4    
 178 Goto表如下
 179     Goal    List    Pair    
 180 0    err    s1    s3    
 181 1    err    err    s4    
 182 2    err    err    s7    
 183 3    err    err    err    
 184 4    err    err    err    
 185 5    err    err    err    
 186 6    err    err    s9    
 187 7    err    err    err    
 188 8    err    err    err    
 189 9    err    err    err    
 190 10    err    err    err    
 191 11    err    err    err    
 192 
 193 
 194 */ 
 195 public class Demo2 {
 196     public static void main (String[] args) {
 197         //將輸入的產生式都放入ch中
 198         Scanner scanner = new Scanner(System.in);
 199         String s = new String();
 200         String c;
 201         //輸入處理...
 202         while (true) {
 203             c = scanner.nextLine();
 204             int i;
 205             for (i = 0; i < c.length(); i++) {
 206                 if (c.charAt(i) != '#')
 207                     s += c.charAt(i);
 208                 else {
 209                     scanner.close();
 210                     break;
 211                 }
 212             }
 213             if (i != c.length()) {
 214                 break;
 215             }
 216         }
 217         BnfContainer bc = new BnfContainer();
 218         CodeAnalyzer ca = new CodeAnalyzer(s, bc);
 219         ca.analyze();
 220         bc.toLRTable();
 221         bc.printActionAndGotoTable();
 222     }
 223 }
 224 
 225 /**
 226  * 用來裝載BNF范式的信息。
 227  */
 228 class BnfContainer {
 229     /**
 230      * 內部類,NT的節點。
 231      * @author HP
 232      */
 233     class NTNode {
 234         private String name; //符號id
 235         private List<List<Integer>> expr;
 236         public NTNode(String name) {
 237             expr = new ArrayList<List<Integer>>();
 238             this.name = name;
 239         }
 240         /**
 241          * 添加一條expr
 242          * 返回這個expr的下標
 243          * @return
 244          */
 245         public int addExpr() {
 246             expr.add(new ArrayList<Integer>());
 247             return expr.size() - 1;
 248         }
 249         /**
 250          * 向下標為idx的expr添加value
 251          * @param idx
 252          * @param value
 253          */
 254         public void addExprElement (int idx, int value) {
 255             this.expr.get(idx).add(value);
 256         }
 257         /**
 258          * 向最后一個表達式添加value
 259          * @param value
 260          */
 261         public void addExprElement (int value) {
 262             this.addExprElement(this.expr.size() - 1, value);
 263         }
 264         
 265         public void printNTNode () {
 266             System.out.println("NTNumber : " + this.name);
 267             for (List<Integer> list : this.expr) {
 268                 for (Integer val : list) {
 269                     System.out.print(val + " ");
 270                 }System.out.println();
 271             }
 272         }
 273     }
 274     
 275     
 276     //常量定義
 277     /**
 278      * 這兩個常量只出現在終結符
 279      * 因為要將終結符和非終結符
 280      * 放在同一個鏈表中
 281      * 所以使用這個來辨別終結符和非終結符。
 282      */
 283     private static final int MASK = 0X80000000; //掩碼,用來給終結符做掩飾的編碼。
 284     private static final int DECODE = 0X7fffffff; //解碼,破譯掩碼得到原本的編碼。
 285     private static final String separationCharacter = " ";
 286     /**
 287      * 非終結符Map 
 288      * key : 非終結符名稱
 289      * value : 非終結符在production鏈表中的下標
 290      */
 291     private HashMap<String,Integer> NTMap;
 292     /**
 293      * 終結符Map 
 294      * key : 終結符名稱
 295      * value : 終結符在T鏈表中的下標
 296      */
 297     private HashMap<String,Integer> TMap;
 298     // 終結符鏈表
 299     private ArrayList<String> T;
 300     // 產生式鏈表,因為一個非終結符一個產生式具有雙射關系。
 301     private ArrayList<NTNode> production;
 302     //如若未設置,默認為0
 303     public int startIndex = 0;
 304     private int eof, epsilon;
 305     /**
 306      * 這個數組包含了所有非終結符的FIRST
 307      */
 308     private Set<Integer>[] First;
 309     /**
 310      * 要輸出的Action表
 311      */
 312     private int[][] Action;
 313     /**
 314      * 要輸出的Goto表
 315      */
 316     private int[][] Goto;
 317     
 318     public BnfContainer() {
 319         //內部數據結構初始化
 320         NTMap = new HashMap<String,Integer>();
 321         TMap = new HashMap<String,Integer>();
 322         T = new ArrayList<String>();
 323         production = new ArrayList<NTNode>();
 324         
 325         
 326         //添加兩個特殊的非終結符 eof 和 ε
 327         this.addT("eof");
 328         this.addT("ε");
 329         eof = this.getTSerialNumber("eof");
 330         epsilon = this.getTSerialNumber("ε");
 331     }
 332     
 333     /**
 334      * 設置開始非終結符
 335      * @param name
 336      */
 337     public void setStart (String name) {
 338         this.addNT(name);
 339         this.startIndex = this.NTMap.get(name);
 340     }
 341     
 342     /**
 343      * 將非終結符的名字傳入,即可添加一個非終結符節點。
 344      * @param name
 345      */
 346     public void addNT (String name) {
 347         if (name.isEmpty()) {
 348             System.out.println("終結符不可為空");
 349             System.exit(-1);
 350         }
 351         if (!NTMap.containsKey(name)) {
 352             NTNode node = new NTNode(name);
 353             NTMap.put(name, production.size());
 354             production.add(node);
 355         }
 356     }
 357     
 358     /**
 359      * 將終結符傳入,增加非終結符。
 360      * @param name
 361      */
 362     public void addT(String name) {
 363         if (!this.TMap.containsKey(name)) {
 364             this.TMap.put(name, T.size());
 365             this.T.add(name);
 366         }
 367     }
 368     
 369     /**
 370      * 輸入終結符名稱
 371      * 獲取終結符編號
 372      * 如若存在當前終結符,返回編號
 373      * 否則返回-1,輸出錯誤警告並且退出。
 374      * @param name
 375      * @return
 376      */
 377     private int getTSerialNumber (String name) {
 378         this.notFindTWarning(name);
 379         return this.TMap.get(name) | BnfContainer.MASK;
 380     }
 381     
 382     /**
 383      * 輸入非終結符名稱
 384      * 獲取非終結符編號
 385      * 如若存在當前非終結符,返回編號
 386      * 否則返回-1,輸出錯誤警告並且退出。
 387      * @param name
 388      * @return
 389      */
 390     private int getNTSerialNumber (String name) {
 391         this.notFindNTWarning(name);
 392         return this.NTMap.get(name);
 393     }
 394     
 395     /**
 396      * 創建新的表達式並添加到名稱為name的非終結符節點上
 397      * 返回表達式編號
 398      */
 399     public int creatNewExper(String name) {
 400         this.notFindNTWarning(name);
 401         NTNode ntn = this.production.get(this.NTMap.get(name));
 402         return ntn.addExpr();
 403     }
 404     /**
 405      * 向左端非終結符名稱為name的產生式
 406      * 第idx表達式添加元素
 407      * @param name
 408      * @param idx
 409      * @param isNt
 410      */
 411     public void addExpeElement(String name, int idx,boolean isNt, String addElement) {
 412         NTNode ntn = this.production.get(this.NTMap.get(name));
 413         if (isNt) {
 414             this.notFindNTWarning(name);
 415             this.notFindNTWarning(addElement);
 416             ntn.addExprElement(idx, this.getNTSerialNumber(addElement));
 417         }else {
 418             this.addT(addElement);
 419             ntn.addExprElement(idx, this.getTSerialNumber(addElement));
 420         }
 421     }
 422     
 423     /**
 424      * 向左端非終結符名稱為name的產生式
 425      * 最后一個表達式添加元素
 426      * @param name
 427      * @param list
 428      */
 429     public void addExpeElement(String name,boolean isNt, String addElement) {
 430         NTNode ntn = this.production.get(this.NTMap.get(name));
 431         if (isNt) {
 432             this.notFindNTWarning(name);
 433             this.notFindNTWarning(addElement);
 434             ntn.addExprElement(this.getNTSerialNumber(addElement));
 435         }else {
 436             this.addT(addElement);
 437             ntn.addExprElement(this.getTSerialNumber(addElement));
 438         }
 439     }
 440     
 441     /**
 442      * 如若找到了當前非終結符,什么都不會發生。
 443      * 否則會提示並且退出程序
 444      * @param name
 445      */
 446     private void notFindNTWarning(String name) {
 447         if (!this.NTMap.containsKey(name)) {
 448             System.out.println("錯誤的非終結符" + name + "!");
 449             System.exit(-1);
 450         }
 451     }
 452     /**
 453      * 如若找到了當前終結符,什么都不會發生。
 454      * 否則會提示並且退出程序
 455      * @param name
 456      */
 457     private void notFindTWarning(String name) {
 458         if (!this.TMap.containsKey(name)) {
 459             System.out.println("錯誤的終結符" + name + "!");
 460             System.exit(-1);
 461         }
 462     }
 463 
 464     public void printBNF() {
 465         System.out.println("開始非終結符為 : " + this.production.get(startIndex).name);
 466 //        System.out.println("終結符對應表 : ");
 467 //        for (int i = 0; i < this.T.size(); i++) {
 468 //            System.out.println(this.T.get(i) + " : " + (i | MASK));
 469 //        }
 470 //        System.out.println("非終結符對應表 : ");
 471 //        for (int i = 0; i < this.production.size(); i++) {
 472 //            System.out.println(this.production.get(i).name + " : " + i);
 473 //        }
 474         for (NTNode ntn : this.production) {
 475             ntn.printNTNode();
 476         }
 477         
 478         System.out.println("First集 : ");
 479         int count = 0;
 480         for (Set<Integer> s : First) {
 481             System.out.println("第" + count + "個非終結符" + this.production.get(count).name);
 482             for (Integer i : s) {
 483                 this.printSymbol(i);
 484             }System.out.println();
 485             count++;
 486         }
 487         System.out.println("一共有 " + this.CC.size() + " 種狀態");
 488         for (Set<String> s : this.CC) {
 489             this.printCCSet(s);
 490         }
 491     }
 492     /**
 493      * 輸出項集 s
 494      * @param s
 495      */
 496     private void printCCSet(Set<String> s) {
 497         for (String item : s) {
 498             this.printItem(item);
 499         }
 500         System.out.println();
 501     }
 502     
 503     
 504     private void printItem (String item) {
 505         String[] strs = item.split(BnfContainer.separationCharacter); // ! 為分隔符
 506         int productionIdx = Integer.parseInt(strs[0]); //產生式下標
 507         int exprIdx = Integer.parseInt(strs[1]); //表達式下標
 508         int placeholder = Integer.parseInt(strs[2]); //占位符下標 這個下標從0開始(表示左側無語法符號)。
 509         int prospectiveSymbol = Integer.parseInt(strs[3]);//前瞻符
 510         NTNode ntn = this.production.get(productionIdx);
 511         System.out.print("[" + ntn.name + "::=");
 512         List<Integer> list = ntn.expr.get(exprIdx);
 513         for (int i = 0; i < list.size(); i++) {
 514             if (i == placeholder) {
 515                 System.out.print("·");
 516             }
 517             this.printSymbol(list.get(i));
 518             System.out.print(" ");
 519         }
 520         if (list.size() == placeholder) {
 521             System.out.print("·");
 522         }
 523         System.out.print(",");
 524         this.printSymbol(prospectiveSymbol);
 525         System.out.print("]\t");
 526     }
 527     
 528     private void printSymbol (int sym) {
 529         if (this.isT(sym)) {
 530             System.out.print(this.T.get(sym & DECODE));
 531         }else {
 532             System.out.print(this.production.get(sym).name);
 533         }
 534     }
 535     
 536     /**
 537      * 求所有非終結符符號的FIRST集(終結符的FIRST就是它本身)
 538      * FIRST(A) : 對於語法符號A,FIRST(A)表示,
 539      * 從A推導出的符號串的第一個單詞所對應的終結符的集合。
 540      */
 541     private void FIRSTAllSymbol() {
 542         First = new Set[this.production.size()];
 543         for (int i = First.length - 1; i >= 0; i--) {
 544             FIRST(i);
 545         }return;
 546     }
 547     /**
 548      * 輸入非終結符下標
 549      */
 550     private void FIRST(int idx) {
 551         if (First[idx] != null) {
 552             return;
 553         }First[idx] = new HashSet<Integer>();
 554         List<List<Integer>> next = this.production.get(idx).expr;
 555         for (List<Integer> list : next) {
 556             int val = list.get(0);
 557             //非終結符
 558             if (this.isT(val)) {
 559                 First[idx].add(val);
 560             }else {
 561                 this.FIRST(val);
 562                 First[idx].addAll(First[val]);
 563             }
 564         }
 565     }
 566     
 567     private boolean isT (int val) {
 568         return (val & MASK) == MASK;
 569     }
 570     /**
 571      * 一個產生式項
 572      * 分別有四個元素
 573      * productionIdx : 產生式下標
 574      * exprIdx : 表達式下標
 575      * placeholder : 占位符
 576      * prospectiveSymbol : 前瞻符
 577      */
 578     /**
 579     閉包運算
 580     closure(s) : 
 581     while (s is still changing)
 582         for each item [A -> β·Cθ,a] ∈ s        //從當前s集中尋求新狀態
 583             for each production C -> γ ∈ P  //有效產生式
 584                 for each b ∈ FIRST(θa)        //如若是可能是前瞻符號(非終結符)
 585                     s <- s ∪ {[C -> ·γ,b]}
 586      */
 587     private List<Set<String>> CC;
 588     private void closure (Set<String> s) {
 589         int lastSize = -1;
 590         while (lastSize != s.size()) {
 591             lastSize = s.size();
 592             Set<String> hashset = new HashSet<String>();
 593             for (String item : s) {
 594                 String[] strs = item.split(BnfContainer.separationCharacter); //  為分隔符
 595                 int productionIdx = Integer.parseInt(strs[0]); //產生式下標
 596                 int exprIdx = Integer.parseInt(strs[1]); //表達式下標
 597                 int placeholder = Integer.parseInt(strs[2]); //占位符下標 這個下標從0開始(表示左側無語法符號)。
 598                 int prospectiveSymbol = Integer.parseInt(strs[3]);//前瞻符
 599                 List<Integer> temp = this.production.get(productionIdx).expr.get(exprIdx);
 600                 //for each item [A -> β·Cθ,a] ∈ s        //從當前s集中尋求新狀態
 601                 //    for each production C -> γ ∈ P  //有效產生式
 602                 //temp.get(placeholder) 為 這里的 C
 603                 //條件為 C不是終結符 且 當前占位符未達到最右端    如若C是個終結符,那么就無法拓展,如若占位符已經到達最右端,也無法拓展。
 604                 if (placeholder < temp.size() && !this.isT(temp.get(placeholder))) {
 605                     int cIdx = temp.get(placeholder);
 606                     //先求FIRST(占位符后的串)
 607                     Set<Integer> set = this.FIRSTNextStr(temp, placeholder + 1, prospectiveSymbol);
 608                     List<List<Integer>> expr = this.production.get(cIdx).expr;
 609                     for (int i = 0; i < expr.size(); i++){
 610                         for (Integer val : set) {
 611                             String res = cIdx + BnfContainer.separationCharacter + i + BnfContainer.separationCharacter + 0 + BnfContainer.separationCharacter + val;
 612                             hashset.add(res);
 613                         }
 614                     }
 615                 }
 616             }s.addAll(hashset);
 617         }
 618         /**
 619          * 項集之間會有交集,
 620          * start : <Goal>;
 621          * <Goal> ::= <List>;
 622          * <List> ::= <List><Pair>
 623          *         |    <Pair>;
 624          * <Pair> ::= "(" <Pair> ")"
 625          *         |    "("")";
 626          * #
 627          * 書上這個例子的原項 CC0 和 CC1就重復了 [Pair ::= ·(Pair),(]
 628          * 當然還有其他的也重復了...
 629          */
 630         return;
 631     }
 632     /*
 633     goto(s, x)
 634     moved <- ∅
 635     for each item ∈ s        //對於s集中的每個項
 636         if the from of i is [A -> β·xθ,a] then
 637             moved <- moved ∪ {[A -> βx·θ,a]}
 638     return closure (moved)
 639     */
 640     private Set<String> go (Set<String> s, int x){
 641         Set<String> res = new HashSet<String>();
 642         for (String item : s) {
 643             String[] strs = item.split(BnfContainer.separationCharacter); // ! 為分隔符
 644             int productionIdx = Integer.parseInt(strs[0]); //產生式下標
 645             int exprIdx = Integer.parseInt(strs[1]); //表達式下標
 646             int placeholder = Integer.parseInt(strs[2]); //占位符下標 這個下標從0開始(表示左側無語法符號)。
 647             int prospectiveSymbol = Integer.parseInt(strs[3]);//前瞻符
 648             List<Integer> temp = this.production.get(productionIdx).expr.get(exprIdx);
 649             String str = new String();
 650             if (placeholder + 1 <= temp.size() && temp.get(placeholder) == x) {
 651                 str = productionIdx + BnfContainer.separationCharacter + exprIdx + BnfContainer.separationCharacter + (placeholder + 1) + BnfContainer.separationCharacter + prospectiveSymbol;
 652                 res.add(str);
 653             }
 654         }
 655         this.closure(res);
 656         return res;
 657     }
 658     
 659     /**
 660      * 獲取    從expr表達式中下標為idx的語法符號開始的串     的FIRST
 661      * @param expr
 662      * @param idx
 663      * @param prospectiveSymbol
 664      * @return
 665      */
 666     private Set<Integer> FIRSTNextStr (List<Integer> expr, int idx, int prospectiveSymbol){
 667         Set<Integer> res = new HashSet<Integer>();
 668         if (idx >= expr.size()) {
 669             res.add(prospectiveSymbol);
 670             return res;
 671         }
 672         //當前符號是終結符
 673         if (this.isT(expr.get(idx))) {
 674             res.add(expr.get(idx));
 675             return res;
 676         }
 677         res.addAll(First[expr.get(idx)]);
 678         //如若存在 epsilon 
 679         if (res.contains(this.epsilon)) {
 680             res.remove(this.epsilon);
 681             res.addAll(this.FIRSTNextStr(expr, idx + 1, prospectiveSymbol));
 682         }return res;
 683     }
 684     
 685     /*
 686     CC0 <- closure({[S' -> ·S,eof]}) //構建初始狀態集
 687     CC <- {CC0}        //將初始狀態集添加到規范族CC中
 688     while (new set are still being added to CC)        
 689         for each unmarked set CCi ∈ CC    //unmarked : 未標記的
 690             mark CCi as processed    //將CCi標記為處理過的
 691             for each x following a · in an item in CCi  //對於CCi中項a ·后面的每個x
 692                 temp <- goto(CCi, x)
 693                 if temp ∉ CC
 694                     then CC <- CC ∪ {temp}
 695                 record transition from CCi to temp on x
 696     */
 697     /*
 698      因為最后生成Action表中需要規約 reduce A - > BC 
 699      所以需要找到這個表達式的序號為了方便弄一個前綴數組
 700      記錄在前i個產生式中有多少個表達式。
 701      */
 702     int[] preArr;
 703     
 704     private void initPreArr() {
 705         this.preArr = new int[this.production.size()];
 706         if (this.preArr.length > 0) {
 707             this.preArr[0] = this.production.get(0).expr.size();
 708             for (int i = 1; i < this.preArr.length; i++) {
 709                 this.preArr[i] = this.preArr[i - 1] + this.production.get(i).expr.size();
 710             }
 711         }
 712     }
 713     public void toLRTable() {
 714         //初始化。
 715         this.initPreArr();
 716         this.FIRSTAllSymbol();
 717         Set<String> CC0 = new HashSet<String>();
 718         List<List<Integer>> expr = this.production.get(startIndex).expr;
 719         for (int i = 0; i < expr.size(); i++) {
 720             CC0.add(this.startIndex + BnfContainer.separationCharacter + i + BnfContainer.separationCharacter + 0 + BnfContainer.separationCharacter + this.eof);
 721         }
 722         this.closure(CC0);
 723         CC = new ArrayList<Set<String>>();
 724         CC.add(CC0);
 725         int begin = 0;
 726         int lastSize = -1;
 727         List<Node> res = new ArrayList<Node>();
 728         int endState = -1;
 729         while (lastSize != CC.size()) {
 730             lastSize = CC.size();
 731             for (int i = begin; i < lastSize; i++) {
 732                 Set<String> s = this.CC.get(i);
 733                 for (String item : s) {
 734                     String[] strs = item.split(BnfContainer.separationCharacter); // ! 為分隔符
 735                     int productionIdx = Integer.parseInt(strs[0]); //產生式下標
 736                     int exprIdx = Integer.parseInt(strs[1]); //表達式下標
 737                     int placeholder = Integer.parseInt(strs[2]); //占位符下標 這個下標從0開始(表示左側無語法符號)。
 738                     int prospectiveSymbol = Integer.parseInt(strs[3]);//前瞻符
 739                     List<Integer> list = this.production.get(productionIdx).expr.get(exprIdx);
 740                     if (placeholder < list.size()) {
 741                         //因為對於每個項集的每個項的前瞻符都會進行一次推導,所以這里包含所有的推導。我們只需要記錄下來就可以生成表了。
 742                         int x = list.get(placeholder);
 743                         Set<String> temp = this.go(s, x);
 744                         int CCj = this.CCcontainsTheSet(temp);
 745                         if (CCj == -1) {
 746                             CC.add(temp);
 747                             CCj = this.CC.size() - 1;
 748                         }
 749                         res.add(new Node(i, x, CCj));
 750                     }
 751                     //可歸約狀態
 752                     else {
 753                         res.add(new Node(i, prospectiveSymbol, ((productionIdx - 1 >= 0 ? this.preArr[productionIdx - 1] : 0) + exprIdx + 1) | MASK));
 754                         if (productionIdx == this.startIndex) {
 755                             endState = i;
 756                         }
 757                     }
 758                 }
 759                 //更新begins
 760                 begin = lastSize;
 761             }
 762         }
 763         this.createActionAndGotoTable(res, endState);
 764     }
 765     
 766     /**
 767      * 這是構建表時臨時記錄數據的結構
 768      */
 769     class Node{
 770         int state;
 771         /**
 772          * 對於sym來說就是終結符和非終結符的編碼
 773          * 也是利用這個來區別到底把val放Action
 774          * 表還是Goto表。
 775          */
 776         int sym;
 777         /**
 778          * 對於val來說
 779          * 如若是產生式規約,則將產生式的下標 | MASK作為val
 780          * 如若是正常的狀態轉移,則直接輸入轉移狀態的下標。
 781          */
 782         int val;
 783         
 784         public Node(int state, int sym, int val){
 785             this.state = state;
 786             this.sym = sym;
 787             this.val = val;
 788         }
 789     }
 790     /**
 791      * 利用這個方法去看規范族CC中是否存在set
 792      * 並且會返回set在CC的下標如若存在的話
 793      * @param set
 794      * @return
 795      */
 796     private int CCcontainsTheSet (Set<String> set) {
 797         for (int i = 0; i < CC.size(); i++) {
 798             Set<String> s = CC.get(i);
 799             if (s.size() == set.size() && set.containsAll(s)) {
 800                 return i;
 801             }
 802         }return -1;
 803     }
 804     /*
 805     for each CCi ∈ CC
 806         for each  item I ∈ CCi
 807             if I is [A -> β·cθ,a] and goto (CCi , c) = CCj then
 808                 Action[i,c] <- "shift j"
 809             else if I is [A -> β·,a] then    //規約
 810                 Action[i,a] <- "reduce A->B"
 811             else if I is [S'->S·,eof] then    //如若是目標項產生式推導完成狀態並且前瞻符號為eof,則為接收狀態。
 812                 Action[i,eof] <- "accept"
 813         for each n ∈ NT                //如若項集CCi跳過一個非終結符n即到達j
 814             if goto(CCi, n) = CCj then
 815                 Goto[i,n] <- j
 816     */
 817     private void createActionAndGotoTable(List<Node> node, int endState) {
 818         //豎是狀態 橫是終結符
 819         this.Action = new int[this.CC.size()][this.T.size()];
 820         //賦初始值
 821         for (int i = this.CC.size() - 1; i >= 0; i--) {
 822             for (int j = this.T.size() - 1; j >=0; j--) {
 823                 this.Action[i][j] = -1;
 824             }
 825         }
 826         //豎是狀態 橫是非終結符
 827         this.Goto = new int[this.CC.size()][this.production.size()];//賦初始值
 828         for (int i = this.CC.size() - 1; i >= 0; i--) {
 829             for (int j = this.production.size() - 1; j >=0; j--) {
 830                 this.Goto[i][j] = -1;
 831             }
 832         }
 833         for (Node n : node) {
 834             //如若跨越的符號是終結符
 835             if (this.isT(n.sym)) {
 836                 Action[n.state][n.sym & DECODE] = n.val;
 837             }else {
 838                 Goto[n.state][n.sym] = n.val;
 839             }
 840         }
 841         //將接受狀態設為最低值。
 842         this.Action[endState][this.eof & DECODE] = Integer.MIN_VALUE;
 843         return;
 844     }
 845     
 846     
 847     public void printActionAndGotoTable() {
 848         if (this.Action == null || this.Goto == null) {
 849             System.out.println("表未生成,請使用toLRTable函數生成表。");
 850             return;
 851         }
 852         //先輸出一行終結符
 853         System.out.println("Action表如下");
 854         System.out.print("\t");
 855         for (int i = 0; i < this.T.size(); i++) {
 856             if (i != (this.epsilon & DECODE)) {
 857                 System.out.print(this.T.get(i) + "\t");
 858             }
 859         }
 860         System.out.print("\n");
 861         for (int i = 0; i < this.Action.length; i++) {
 862             // 每行第一個輸出i
 863             System.out.print(i + "\t");
 864             for (int j = 0; j < this.Action[i].length; j++) {
 865                 if (j != (this.epsilon & DECODE)) {
 866                     if (this.Action[i][j] == -1) {
 867                         System.out.print("err\t");
 868                     } // 規約操作
 869                     else if (this.Action[i][j] == Integer.MIN_VALUE) {
 870                         System.out.print("acc\t");
 871                     } else if ((this.Action[i][j] & MASK) == MASK) {
 872                         System.out.print("r" + (this.Action[i][j] & DECODE) + "\t");
 873                     } else {
 874                         System.out.print("s" + this.Action[i][j] + "\t");
 875                     }
 876                 }
 877             }
 878             System.out.print("\n");
 879         }
 880         System.out.println("Goto表如下");
 881         // 先輸出一行非終結符
 882         System.out.print("\t");
 883         for (int i = 0; i < this.production.size(); i++) {
 884             System.out.print(this.production.get(i).name + "\t");
 885         }
 886         System.out.print("\n");
 887         for (int i = 0; i < this.Goto.length; i++) {
 888             // 每行第一個輸出i
 889             System.out.print(i + "\t");
 890             for (int j = 0; j < this.Goto[i].length; j++) {
 891                 if (this.Goto[i][j] == -1) {
 892                     System.out.print("err\t");
 893                     continue;
 894                 }
 895                 System.out.print("s" + this.Goto[i][j] + "\t");
 896             }System.out.print("\n");
 897         }
 898     }
 899 }
 900 
 901 /**
 902  * 代碼分析器 可以將代碼轉換為信息等價的數據結構
 903  */
 904 class CodeAnalyzer {
 905     class Token{
 906         boolean isNt;
 907         String name;
 908         public Token (boolean isNt, String name) {
 909             this.isNt = isNt;
 910             this.name = name;
 911         }
 912     }
 913     private char[] text;
 914     private int textSize = 0; //字符串有效長度
 915     private int point = 0; //text解析進度的指針
 916     private BnfContainer bc;
 917     private Token token;
 918     String left; //左側非終結符
 919     private int count = 0; //記錄當前已經解析到哪個產生式了
 920     public CodeAnalyzer (String text, BnfContainer bc) {
 921         this.bc = bc;
 922         //初始化代碼分析器
 923         this.initText(text);
 924         this.initStartSymbol();
 925         this.initCodeAnalyzer();
 926     }
 927     /**
 928      * 輸入字符串文本,返回處理完畢的字符數組。
 929      * @param s
 930      * @return
 931      */
 932     private void initText(String s) {
 933         this.text = s.toCharArray();
 934         int idx = 0;
 935         //將字符串變為一個緊湊的字符數組(去除一些妨礙的字符)
 936         while (idx < text.length) {
 937             if (text[idx] == '\r' || text[idx] == '\n' || text[idx] == '\t' || text[idx] == ' ') {
 938                 idx++;
 939             }else {
 940                 text[textSize++] = text[idx++];
 941             }
 942         }
 943     }
 944 
 945     private void initStartSymbol() {
 946         // 驗證是否存在start:<
 947         point = 0;
 948         char[] needle = { 's', 't', 'a', 'r', 't', ':', '<' };
 949         if (textSize <= needle.length) {
 950             this.notFindStartNT();
 951         }
 952         point = 0;
 953         while (point < needle.length) {
 954             if (needle[point] == text[point]) {
 955                 point++;
 956             } else {
 957                 this.notFindStartNT();
 958             }
 959         }
 960         point = needle.length;
 961         while (point < textSize && text[point] != '>') {
 962             point++;
 963         }
 964         this.bc.setStart(new String(text, needle.length, point - needle.length));
 965         this.skip(Type.RT);
 966         this.skip(Type.SEMICOLON);
 967     }
 968     /**
 969      * 通過skip來跳過字符
 970      */
 971     enum Type{
 972         LT, //左尖括號
 973         RT, //右尖括號
 974         SEMICOLON, //分號
 975         QUOTE, //雙引號
 976         OR, //或
 977         COLON, // :
 978         EQ, //等於號
 979     }
 980     private void skip (Type t) {
 981         switch(t) {
 982         case LT:
 983             this.skip('<');
 984             break;
 985         case RT:
 986             this.skip('>');
 987             break;
 988         case OR:
 989             this.skip('|');
 990             break;
 991         case SEMICOLON:
 992             this.skip(';');
 993             break;
 994         case QUOTE:
 995             this.skip('"');
 996             break;
 997         case COLON:
 998             this.skip(':');
 999             break;
1000         case EQ:
1001             this.skip('=');
1002             break;
1003         }
1004     }
1005     private void skip (char c) {
1006         if (point >= this.textSize || this.text[point] != c) {
1007             System.out.println("第" + this.count + "個產生式,缺少符號  " + c);
1008             System.exit(-1);
1009         }
1010         point++;
1011     }
1012     /**
1013      * 報錯 : 沒有找到目標(開始)非終結符號! 並退出程序。
1014      */
1015     private void notFindStartNT() {
1016         System.out.println("沒有找到目標非終結符號!");
1017         System.exit(-1);
1018     }
1019 
1020     /**
1021      * 之所以一開始就要添加非終結符,而不在解析BNF時候添加
1022      * 是因為,非終結符存在定義的問題,如若 沒有定義
1023      * 但有使用(只在右側出現,未在左側定義),這個就是錯誤的。
1024      */
1025     private void initCodeAnalyzer() {
1026         int idx = this.point;
1027         this.point = 0;
1028         this.count = 0;
1029         while (true) {
1030             while (this.point < textSize && text[this.point] != ';') {
1031                 this.point++;
1032             }this.point++;
1033             this.count++;
1034             //如若分號后面沒有左括號
1035             if (this.point >= textSize) {
1036                 break;
1037             }
1038             String name = this.getNT();
1039             bc.addNT(name);
1040         }this.count = 0;
1041         this.point = idx;
1042     }
1043 
1044     /**
1045      * BNF
1046      * 從point開始解析字符串。
1047      * <Goal> ::= {<Production>}
1048      * <Production> ::= <左側非終結符> "::=" <Expr>;
1049      * <Expr> ::= <Term> { "|" <Term>}";";
1050      * <Term> ::= {<Factor>};     //Term在這就是多個終結符或非終結符相連接
1051      * <Factor> ::= <T> | <NT>
1052      */
1053     public void analyze() {
1054         while (point < this.textSize) {
1055             this.count++;
1056             production();
1057         }
1058     }
1059     
1060     public void production(){
1061         //先跳過左側非終結符
1062         this.left = this.getNT();
1063         this.skipDefineSymol();
1064         this.expr();
1065     }
1066     /**
1067      * 跳過 ::=
1068      */
1069     public void skipDefineSymol() {
1070         skip(Type.COLON);
1071         skip(Type.COLON);
1072         skip(Type.EQ);
1073     }
1074     /**
1075      * 獲取非終結符
1076      * <xxx>
1077      */
1078     public String getNT () {
1079         skip(Type.LT);
1080         StringBuilder res = new StringBuilder();
1081         while (this.point < this.textSize && text[this.point] != '>') {
1082             res.append(text[this.point++]);
1083         }
1084         skip(Type.RT);
1085         return res.toString();
1086     }
1087     
1088     /**
1089      * 當前指針指向 "T" 中第一個"
1090      * @return
1091      */
1092     public String getT() {
1093         this.skip(Type.QUOTE);
1094         StringBuilder res = new StringBuilder();
1095         while (this.point < this.textSize && this.text[this.point] != '"') {
1096             res.append(text[this.point++]);
1097         }
1098         this.skip(Type.QUOTE);
1099         return res.toString();
1100     }
1101     
1102     /**
1103      * 當前指針指向 ::= <T>... 中 = 后一個符號
1104      */
1105     public void expr(){
1106         this.term();
1107         while (this.point < this.textSize && text[this.point] == '|') {
1108             this.skip(Type.OR);
1109             term();
1110         }this.skip(Type.SEMICOLON);
1111     }
1112     
1113     /**
1114      * 如若還有符號,當前符號指向 終結符或非終結符的符號  < 或者 "
1115      */
1116     public void term(){
1117         //創建一個屬於當前term的鏈表
1118         bc.creatNewExper(this.left);
1119         while (this.point < this.textSize && (text[this.point] == '"' || text[this.point] == '<')) {
1120             factor();
1121             bc.addExpeElement(this.left, token.isNt, token.name);
1122         }
1123     }
1124     
1125     /**
1126      * 通過factor獲取token
1127      */
1128     public void factor(){
1129         //非終結符
1130         if (text[this.point] == '"') {
1131             String name = this.getT(); 
1132             this.token = new Token(false, name);
1133         }else {
1134             String name = this.getNT();
1135             token = new Token (true, name);
1136         }
1137     }
1138 }
免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。
猜您在找 LR(1)語法分析器生成器(生成Action表和Goto表)java實現(一) 語法分析器自動生成工具一覽 apache日志生成器+apache日志分析器編譯原理實驗：java實現語法分析器語法分析器可配置語法分析器開發紀事（二）——構造符號表編譯原理課程設計 LR1分析語法分析器實現（C++）可配置語法分析器開發紀事（三）——生成下推自動機預測分析法實現的語法分析器實現算術表達式的語法分析器