python實現詞法分析器

本文轉載自查看原文 2020-11-20 11:56 2233 編譯原理實驗

這大概是全網最簡陋的詞法分析器……學了一點python后上手的第一個小實驗。

實驗要求粘在下面了，但是實現過程中我根據自己想法做了一些修改。

一、實驗目的：

設計並實現一個包含預處理功能的詞法分析程序，加深對編譯中詞法分析過程的理解。

二、實驗要求：

1.實現預處理功能

源程序中可能包含有對程序執行無意義的符號，要求將其剔除。
首先編制一個源程序的輸入過程，從鍵盤、文件或文本框輸入若干行語句，依次存入輸入緩沖區（字符型數據）；然后編制一個預處理子程序，去掉輸入串中的回車符、換行符和跳格符等編輯性文字；把多個空白符合並為一個；去掉注釋。

2.實現詞法分析功能

輸入：所給文法的源程序字符串。
輸出：
二元組（syn,token或sum）構成的序列。其中，
syn為單詞種別碼。
Token為存放的單詞自身字符串。
Sum為整型常量。
具體實現時，可以將單詞的二元組用結構進行處理。

3.待分析的C語言子集的詞法

關鍵字

main  if  then  while  do  static  int  double  struct  break  else  long  switch  case  typedef  char  return  const  float  short  continue  for  void  default  sizeof  do

所有的關鍵字都是小寫。

運算符和界符
+ - * / : := < <> <= > >= = ; ( ) #
其他標記ID和NUM
通過以下正規式定義其他標記：

ID→letter(letter|digit)*
NUM→digit digit*
letter→a|…|z|A|…|Z
digit→0|…|9…

空格由空白、制表符和換行符組成
空格一般用來分隔ID、NUM、專用符號和關鍵字，詞法分析階段通常被忽略。

4.各種單詞符號對應的種別碼

單詞符號	種別碼	單詞符號	種別碼
main	1	;	41
if	2	(	42
then	3	)	43
while	4	int	7
do	5	double	8
static	6	struct	9
ID	25	break	10
NUM	26	else	11
+	27	long	12
-	28	switch	13
*	29	case	14
/	30	typedef	15
**	31	char	16
==	32	return	17
<	33	const	18
<>	34	float	19
<=	35	short	20
>	36	continue	21
>=	37	for	22
=	38	void	23
[	39	sizeof	24
]	40	#	0

源代碼：

import re
import sys

#關鍵字，百度百科上復制來的63個關鍵字……
key_word = ['asm','do','if','return','typedef','auto','double','inline','short','typeid','bool',
            'dynamic_cast','int','signed','typename','break','else','long','sizeof','union','case',
            'enum','mutable','static','unsigned','catch','explicit','namespace','static_cast',
            'using','char','export','new','struct','virtual','class','extern','operator','switch',
            'void','const','false','private','template','volatile','const_cast','float','protected',
            'this','wchar_t','continue','for','public','throw','while','default','friend','register'
            'true','delete','goto','reinterpret_cast','try']

#一些常用函數，不然老被識別為標識符，目前是16個
function_word = ['cin','cout','scanf','printf','abs','sqrt','isalpha','isdigit','tolower','toupper'
                 'strcpy','strlen','time','rand','srand','exit']

operator = ['+','-','*','/',':',':=','<','<>','<=','>','>=','=',';','(',')','#','==','{','}',',','&','[',']',"'"]

with open('cpp.txt', 'w') as file:
    print("請輸入需要進行詞法分析的源程序:")
    txt = sys.stdin.readlines()
    file.writelines(txt)

with open('cpp.txt', 'r') as file:
        #預處理,增加了去除字符串的功能，畢竟字符串肯定不是標識符啊……
        txt = ' '.join(file.readlines())
        deal_txt = re.sub(r'/\*(.|[\r\n])*?\*/|//.*', ' ', txt)
        deal_txt = re.sub(r'\"(.|[\r\n])*?\"', ' ', txt)
        deal_txt = deal_txt.strip()
        deal_txt = deal_txt.replace('\t', ' ').replace('\r', ' ').replace('\n', ' ')
        #詞法分析，標識符識別規則加入了_
        keyword = []
        funword = []
        opeword = []
        idword = []
        numword = []
        errword = []
        pha = re.findall(r'[a-zA-Z_][a-zA-Z0-9_]*', deal_txt)
        num = re.findall(r'\d+',deal_txt)
        str = re.findall(r'[^\w]', deal_txt)
        for p in pha:
            if p in key_word:
                keyword.append({p : key_word.index(p) + 1})
            elif p in function_word:
                funword.append({p : len(key_word) + function_word.index(p) + 1})
            else:
                idword.append({p : 80})
        for n in num:
            numword.append({n : 81})
        for s in str:
            if s in operator:
                opeword.append({s: len(key_word) + len(function_word) + operator.index(s) + 3})
            elif s != ' ':
                errword.append({s : 'ERROR'})
        print("關鍵字：\n", keyword)
        print('函數：\n', funword)
        print("ID:\n", idword)
        print("數字：\n", numword)
        print("運算符與界符：\n", opeword)
        if len(errword) != 0:
            print("其他：\n", errword)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 詞法分析器的實現 java詞法分析器簡單實現 C++實現詞法分析器編譯原理——詞法分析器實現編譯原理——詞法分析器實現詞法分析器設計詞法分析器 Java語言的詞法分析器的Java實現簡單的詞法分析器的實現詞法分析器--DFA（c++實現）