實現一個簡單的解釋器（2）

本文轉載自查看原文 2020-03-01 12:00 669 翻譯

譯自：https://ruslanspivak.com/lsbasi-part2/
（已獲作者授權）

在他們的著作《有效思維的五個要素》(The 5 Elements of Effective Thinking)中，Burger和Starbird分享了一個故事，講述了他們如何觀察國際知名的小號演奏家托尼·普洛(Tony Plog)為有成就的小號演奏家舉辦大師班。學生們首先演奏復雜的樂句，他們演奏得很好，但是隨后他們被要求演奏非常基本、簡單的音符時，與以前演奏的復雜樂句相比，這些音符聽起來更幼稚(childish)。他們演奏完畢后，大師老師也演奏了相同的音符，但是當他演奏它們時，它們聽起來並不幼稚，區別是驚人的。托尼解釋說，掌握簡單音符的演奏可以使人在更復雜的控制下演奏復雜的樂曲。該課程很明確：要建立真正的技巧，必須將重點放在掌握簡單的基本思想上。

故事中的課程顯然不僅適用於音樂，還適用於軟件開發。這個故事很好地提醒了我們所有人，即使有時感覺就像是退后一步，也不要忘記深入研究簡單，基本概念的重要性。精通所使用的工具或框架很重要，但了解其背后的原理也非常重要。正如Ralph Waldo Emerson所說：

“如果你只學習方法，那么你將被束縛在方法上。但是，如果你學習了原理，就可以設計自己的方法。”

關於這一點，讓我們再次深入了解解釋器和編譯器。

今天，我將向您展示第1部分中的計算器的新版本，該版本將能夠：

1、在輸入字符串中的處理任何地方的空格
2、處理輸入中的多位數整數
3、減去兩個整數（當前只能加整數）
這是可以執行上述所有操作的新版本計算器的源代碼：

# Token types
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception('Error parsing input')

    def advance(self):
        """Advance the 'pos' pointer and set the 'current_char' variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def expr(self):
        """Parser / Interpreter

        expr -> INTEGER PLUS INTEGER
        expr -> INTEGER MINUS INTEGER
        """
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        # we expect the current token to be an integer
        left = self.current_token
        self.eat(INTEGER)

        # we expect the current token to be either a '+' or '-'
        op = self.current_token
        if op.type == PLUS:
            self.eat(PLUS)
        else:
            self.eat(MINUS)

        # we expect the current token to be an integer
        right = self.current_token
        self.eat(INTEGER)
        # after the above call the self.current_token is set to
        # EOF token

        # at this point either the INTEGER PLUS INTEGER or
        # the INTEGER MINUS INTEGER sequence of tokens
        # has been successfully found and the method can just
        # return the result of adding or subtracting two integers,
        # thus effectively interpreting client input
        if op.type == PLUS:
            result = left.value + right.value
        else:
            result = left.value - right.value
        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

將以上代碼保存到calc2.py文件中，或直接從GitHub下載。試試看,了解一下它可以做什么：
它可以處理輸入中任何地方的空格；它可以接受多位數整數，也可以減去兩個整數，也可以加上兩個整數。

這是我在筆記本電腦上的運行效果：

$ python calc2.py
calc> 27 + 3
30
calc> 27 - 7
20
calc>

與第1部分中的版本相比，主要的代碼更改是：

1、get_next_token函數被重構了一部分，遞增pos指針的邏輯單獨放入函數advance中。
2、添加了兩個函數：skip_whitespace忽略空白字符，integer處理輸入中的多位數整數。
3、修改了expr函數，以識別INTEGER-> MINUS-> INTEGER短語，以及INTEGER-> PLUS-> INTEGER短語。現在，函數可以在成功識別(recognize)相應短語之后來解釋加法和減法運算。

在第1部分中，你學習了兩個重要的概念，即Token和詞法分析器(lexical analyzer)的概念。今天，我想談談詞素(lexemes)，解析(parsing)和解析器(parser)。

你已經了解Token，但是，為了使我更完整地討論Token，我需要提及詞素。什么是詞素？詞素是形成Token的一系列字符，在下圖中，你可以看到Token和詞素的一些示例，希望可以使它們之間的關系更清晰一點：

現在，還記得expr函數嗎？我之前說過，這實際上是對算術表達式進行解釋的地方。但是，在解釋一個表達式之前，首先需要識別它是哪種短語(phrase)，例如，是加還是減，這就是expr函數的本質：它從get_next_token方法獲取的Token流中查找結構(structure)，然后解釋已識別的短語，從而生成算術表達式的結果。

在Token流中查找結構的過程，或者換句話說，在Token流中識別短語的過程稱為解析(parsing)。執行該工作的解釋器或編譯器部分稱為解析器(parser)。

因此，現在您知道expr函數是解釋器的一部分，解析和解釋都會發生在expr函數中，首先嘗試在Token流中識別（解析）INTEGER-> PLUS-> INTEGER或INTEGER-> MINUS-> INTEGER短語，並在成功識別（解析）其中一個短語之后，該方法對其進行解釋，將兩個整數相加或相減的結果返回給調用函數。

現在該做練習了：

1、擴展計算器以處理兩個整數的乘法
2、擴展計算器以處理兩個整數的除法
3、修改代碼以解釋包含任意數量的加法和減法的表達式，例如" 9-5 + 3 + 11"

最后再來復習回憶一下：

1、什么是詞素？
2、在Token流中找到結構的過程稱為什么，或者換句話說，識別該Token流中的特定短語的過程叫什么？
3、解釋器（編譯器）中負責解析(parsing)的部分叫什么？

希望您喜歡今天的資料，在下一篇文章中，將擴展計算器以處理更復雜的算術表達式，敬請關注。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 實現一個簡單的解釋器（4）實現一個簡單的解釋器（1）實現一個簡單的解釋器（5）簡單實現一個用Java來解釋Java的解釋器從編譯原理看一個解釋器的實現用C++實現一個Brainfuck解釋器簡單的C++解釋器1.02 記一個腳本解釋器的開發怎樣寫一個解釋器 python實現： protobuf解釋器