在解释器中,词法分析器之后(通常)是什么?

发布于 2024-11-07 07:18:20 字数 228 浏览 4 评论 0原文

对于编程语言解释器,我想知道解释器经历的事件顺序。例如,我认为事情是这样的:

  • 解释器获取一些输入 词法
  • 分析器/分词器获取输入并划分标记
  • x 获取标记列表
  • ???
  • 代码被执行

??? 中属于哪些步骤?点,以及替换 x 的位置(即,什么接收词法分析器生成的标记并对其进行操作)?

For a programming language interpreter, I am wondering about the sequence of events that the interpreter goes through. For instance, I think this is how it goes:

  • Interpreter gets some input
  • The lexer/tokenizer gets the input and demarcates the tokens
  • The x gets the list of tokens
  • ???
  • The code gets executed

What step(s) belongs in the ??? spot, and what goes in the place of x (that is, what recieves and operates on the tokens that the lexer has produced)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

邮友 2024-11-14 07:18:20

我首先推荐经典且免费的书籍:计算机的结构与解释计划视频讲座

Lisp 是基线解释器,其他一切在某种程度上都是该主题的变体。

一般来说,步骤如下:

  • 词法分析获取字符流并生成标记
  • 解析获取标记(平面列表)并构建称为抽象语法树 (AST) 的数据结构。这一步可以非常简单(Lisp),也可以非常复杂(C++、Ruby)。
  • 评估 AST。细节有点不同,但这几乎是深度优先沿着树走。叶子是数据(数字、字符串、常量、变量),节点可以是原始函数(数学、数据操作、控制结构),也可以是更高级别的复合函数。每个节点都应该简化为可以直接馈送到其上方节点的东西。

最后一步是“代码被执行”。对于编译语言或即时 (JIT) 语言,最后一步是将 AST 翻译回机器指令。注意可能存在的其他两个步骤也很重要。一种是翻译成更简单的语言,如 c--、LLVM、.NET 或 Java bitcode。另一个是解析器和评估器之间可能发生的脱糖和/或优化。例如,Haskell 就因大量脱糖而臭名昭著。

最后,我将鼓励您尝试编写一个Scheme(Lisp 的一种方言)解释器的众多演练中的一个。网上某处可能有适合您最喜欢的语言的语言。

I'll start by recommending the classic and free book: Structure and Interpretation of Computer Programs (video lectures)

Lisp is the baseline interpreter and everything else is a in some ways a variation on the theme.

In general the steps are:

  • Lexical analysis take a char stream and produces tokens
  • Parsing takes the tokens (a flat list) and builds a data structure called an Abstract Syntax Tree (AST). This step can be very easy (Lisp) or amazingly complex (C++, Ruby).
  • Evaluate the AST. The details are a bit different but this is pretty much a depth first walk down the tree. Leaves are data (numbers, strings, constants, variables) nodes are either primitive functions (math, data manipulation, control structures) or higher level compound functions. Each node should reduce to something that can be fed directly into the node above it.

This last step is "code gets executed." For a compiled or Just-In-Time (JIT) language the last step is instead translating the AST back into machine instructions. It is also important to note two other steps that may be present. One is a translation into a simpler language like c--, LLVM, .NET, or Java bitecode. The other is desugaring and/or optimization that can happen between the parser and the evaluator. Haskell for example is somewhat infamous for the huge amount of desugaring that goes on.

I'll end by encouraging you to try one of the many walk-throughs for writing a Scheme (a dialect of Lisp) interpreter. There is likely one for your favorite language on the net somewhere.

々眼睛长脚气 2024-11-14 07:18:20

进行解析,将标记流转换为结构化的、经过验证的语法信息。如果你想计算一个算术表达式:

( x + 4 ) * 3

您不是通过从左到右扫描令牌来完成的。您需要弄清楚操作顺序。您需要将 if 关键字和 { } 大括号之间的标记转换为描述 if 语句的高级结构,以便您无需处理一堆代币即可对其进行评估。并且您需要检查语法,如果不正确解析它,这基本上是不可能的;请阅读上下文无关语法

上面的表达式将成为一个抽象语法树,如下所示:

    *
  +   3
 x y

评估它非常简单 - 只需遍历树,然后查找 xy > 在环境中。

类似地,给定一系列如下语句:

if ( p && q< /code> ) { foo ; bar ; } 其他 { baz ; }

抽象语法树可能具有以下一般结构:

IfStatement:
  Condition:
    LogicalConjunction:
      LeftOperand: p
      RightOperand: q
  TruePart:
    BasicBlock:
      Statement: foo
      Statement: bar
  FalsePart:
    BasicBlock:
      Statement: baz

希望您能够想象如何遍历这棵树来解释代码。

我强烈推荐的一本关于解释器的教科书是编程语言要点

Parsing happens, to turn the stream of tokens into structured, validated, syntactic information. If you want to evaluate, say, an arithmetic expression:

( x + 4 ) * 3

you don’t do it by scanning the tokens from left to right. You need to figure out order of operations. You need to turn the tokens between an if keyword and the { } curly braces into a high-level structure describing the if statement, so you can evaluate it without juggling a pile of tokens. And you need to check the syntax, which is essentially impossible without properly parsing it; please read about context-free grammars.

The expression above would become an abstract syntax tree like the following:

    *
  +   3
 x y

Evaluating this is pretty simple — just traverse the tree, and look up x and y in the environment.

Similarly, given a series of statements like this:

if ( p && q ) { foo ; bar ; } else { baz ; }

the abstract syntax tree might have the following general structure:

IfStatement:
  Condition:
    LogicalConjunction:
      LeftOperand: p
      RightOperand: q
  TruePart:
    BasicBlock:
      Statement: foo
      Statement: bar
  FalsePart:
    BasicBlock:
      Statement: baz

Hopefully you can imagine how you would traverse this tree to interpret the code.

A textbook on interpreters which I highly recommend is Essentials of Programming Languages.

压抑⊿情绪 2024-11-14 07:18:20

对于解释器来说,解析器通常会做两件事

  1. 生成 p 代码
  2. 将项目添加到符号表中

此后,执行器将在符号表中执行 p 代码和查找标识符等。

解析器解析它接收到的标记流,并生成更简单、更高效的 p​​-code,同时执行解析过程中发现的任何符号,如变量、函数、复杂数据类型结构等。相位被输入到符号表中并在 p 代码中引用。

然后,执行器处理 p 代码流并执行指令,并使用符号表查找它在符号表中遇到的任何标识符。

For an interpreter the parser will generally do two things

  1. Generate p-code
  2. Add items to the symbol table

After this the executor will execute the p-code and lookup identifiers etc. in the symbol table.

The parser parses the stream of tokens it receives and generates the simpler more efficient to execute p-code at the same time any symbols like variables, functions, complex data type structures etc. that are found during the parsing phase are entered into the symbol table and referenced in the p-code.

The executor then processes the stream of p-code and executes the instructions and uses the symbol table to lookup any identifiers it encounters in the symbol table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文