在 OOP 中解析 S 表达式的正确方法

发布于 2025-01-08 12:10:24 字数 637 浏览 3 评论 0原文

我正在寻找一种实现 S 表达式读取器的方法（稍后与 Scheme 解释器和编译器一起使用），但我一直在问自己应该如何（如果有的话）为其编写 AST。

我一直在阅读SICP，这在Scheme 中非常简单，但我希望以面向对象的方式用C++ 实现解释器和编译器。

请记住，我这样做只是为了学习目的，所以我并不是真正在寻找最简单或最快的方法，而是寻找正确且可重用的方法。

我在一些Scheme 实现中看到人们解析 s-表达式并轻松输出 cons 单元，如下所示：

  struct Sexpr
  {
  };

  struct Cons : public Sexpr
  {
    Sexpr* left;
    Sexpr* right;
  };

  struct IntAtom : Sexpr
  {
    int value;
  };

对于每种Scheme Atom 都有一个 Sexpr 子类，或者类似的东西。

我不确定，但这对我来说似乎是一种黑客行为……这项工作不应该由解释器而不是读者来完成吗？

我想知道的是，这是否被认为是读取 S 表达式的最佳（或正确）方式，或者这更多是解释器的工作而不是解析器的工作？解析器应该有自己的 AST 而不是依赖 cons 单元吗？

原文

I am looking for a way to implement an S-expression reader (to be used later on with both an Scheme interpreter and a compiler), but I've been asking myself how (if at all) I should write an AST for it.

I've been reading SICP, and this is quite straightforward from within Scheme, but I'm looking to implement the interpreter and compiler in C++, in OO fashion.

Please, keep in mind that I'm doing this only for learning purposes, so I'm not really looking for the easiest or quickest way of doing this, but rather the correct and reusable way of doing it.

I've seen in some Scheme implementations that people parse s-expressions and readily output cons cells, something like this:

  struct Sexpr
  {
  };

  struct Cons : public Sexpr
  {
    Sexpr* left;
    Sexpr* right;
  };

  struct IntAtom : Sexpr
  {
    int value;
  };

And one subclass of Sexpr for each kind of Scheme Atom, or something along those lines.

I'm not sure, but this seems like a hack to me... Shouldn't this work be done by an interpreter rather than the reader?

What I want to know is if this is considered the best (or correct) way of reading S-expressions, or is this more a job of the interpreter than the parser? Should the parser have its own AST instead of relying on cons cells?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

極樂鬼 2025-01-15 12:10:24

从Scheme/Racket方面跟进：

Racket（以及其他一些Scheme实现）对语法对象使用更丰富的表示，以便它们可以附加属性来指示（至少在Racket中）它们的上下文。重新绑定，它们来自什么源位置，编译器插入它们的过程，以及您可能想要存储的任何其他信息（参见 Racket 中的“语法属性”）。

这些附加信息可以实现诸如带有源指针的错误消息和卫生宏之类的功能。

请注意，我在这里所说的“更丰富”只是“包含更多价值”的意思，而不是任何非价值中立的方式。

我还应该补充一点——在陷入图灵焦油坑之前——你也可以使用侧面的表格来表示这个完全相同的信息；假设您进行了指针比较，则将值放入结构中与使用表将结构与值关联起来之间没有表现力差异。

回复收藏 0 原文

只为守护你 2025-01-15 12:10:24

如果您希望语法更加完整，则需要支持

sexpr ::= atom | sexpr sexpr
atom ::= nil | intatom | etc.

But that 比您遇到的大多数 sexpr 更通用。 S-expr 最简单和最常见的形式在 LISP/Scheme 中类似于 (abcd)，其中每个 a、b、c、d 都是原子或列表。在成对形式中，这是 [a [b [c [d nil] ] ] ]，这意味着 conses 的所有右侧都是列表。

所以如果你想要干净，你可能会这样做

class sexpr {};
class atom : sexpr {};
class s_list : forward_list<smart_ptr<sexpr>> {};

If you want to be somewhat complete in your syntax, you will need to support

sexpr ::= atom | sexpr sexpr
atom ::= nil | intatom | etc.

But that is more general than most sexpr you will encounter. The easiest and most common form of S-expr which in LISP/Scheme is like (a b c d) where each of a,b,c,d are atoms or lists. In pair form this is [a [b [c [d nil] ] ] ], which means all right sides of your conses are lists.

So if you are going for clean, you might just do

class sexpr {};
class atom : sexpr {};
class s_list : forward_list<smart_ptr<sexpr>> {};

回复收藏 0 原文

一生独一 2025-01-15 12:10:24

虽然人们可能会反复争论什么是“正确”的方法，但在我看来，您建议的方法（使用相同的数据结构进行读取、编译、评估、和处理）是这本书将教会你最多关于 Lisp 和“代码就是数据”口号的内容，特别是 quote 运算符的实际含义（这是一件非常深刻的事情）。

顺便说一句，这也是大多数 Lisp（有趣的是，不包括Scheme）传统上的工作方式。

所以，是的，让读者生成 Lisp 数据：conses、符号、Lisp 数字、字符串等等，与用户级 Lisp 代码将处理的内容完全相同。这将使其余的实现变得更简单、更有启发性。

回复收藏 0 原文

情未る 2025-01-15 12:10:24

您可能想查看这个 c/c++ s-expr 解析器库作为示例它已经完成了。

看起来基本表示是：

struct elt {
  int type;
  char *val;
  struct elt *list; 
  struct elt *next;
};

我引用他们的文档：

由于元素可以是列表或原子，因此元素结构具有可以是 LIST 或 VALUE 的类型指示符。如果类型指示符是LIST，则结构成员“list”将是指向该元素所表示的列表头部的指针。如果类型指示符为 VALUE，则结构成员“val”将包含由元素表示的原子作为字符串。在这两种情况下，“下一个”指针都将指向当前 s 表达式的下一个元素。

另外这里是许多语言中 s-expr 阅读器的其他实现的完整列表，这些语言可能是的兴趣。

You might want to take a look at this c/c++ s-expr parser library for an example of how it has been done.

It looks like the base representation is:

struct elt {
  int type;
  char *val;
  struct elt *list; 
  struct elt *next;
};

And I quote from their docs:

Since an element can be either a list or atom, the element structure has a type indicator that can be either LIST or VALUE. If the type indicator is LIST, then the structure member "list" will be a pointer to the head of the list represented by this element. If the type indicator is VALUE, then the structure member "val" will contain the atom represented by the element as a string. In both cases, the "next" pointer will point at the next element of the current s-expression.

Additionally here is a whole list of other implementations of s-expr readers in lots of languages that may be of interest.

回复收藏 0 原文

~没有更多了~