在 OOP 中解析 S 表达式的正确方法
我正在寻找一种实现 S 表达式读取器的方法(稍后与 Scheme 解释器和编译器一起使用),但我一直在问自己应该如何(如果有的话)为其编写 AST。
我一直在阅读SICP,这在Scheme 中非常简单,但我希望以面向对象的方式用C++ 实现解释器和编译器。
请记住,我这样做只是为了学习目的,所以我并不是真正在寻找最简单或最快的方法,而是寻找正确且可重用的方法。
我在一些Scheme 实现中看到人们解析 s-表达式并轻松输出 cons 单元,如下所示:
struct Sexpr
{
};
struct Cons : public Sexpr
{
Sexpr* left;
Sexpr* right;
};
struct IntAtom : Sexpr
{
int value;
};
对于每种Scheme Atom
都有一个 Sexpr 子类,或者类似的东西。
我不确定,但这对我来说似乎是一种黑客行为……这项工作不应该由解释器而不是读者来完成吗?
我想知道的是,这是否被认为是读取 S 表达式的最佳(或正确)方式,或者这更多是解释器的工作而不是解析器的工作?解析器应该有自己的 AST 而不是依赖 cons 单元吗?
I am looking for a way to implement an S-expression reader (to be used later on with both an Scheme interpreter and a compiler), but I've been asking myself how (if at all) I should write an AST for it.
I've been reading SICP, and this is quite straightforward from within Scheme, but I'm looking to implement the interpreter and compiler in C++, in OO fashion.
Please, keep in mind that I'm doing this only for learning purposes, so I'm not really looking for the easiest or quickest way of doing this, but rather the correct and reusable way of doing it.
I've seen in some Scheme implementations that people parse s-expressions and readily output cons cells, something like this:
struct Sexpr
{
};
struct Cons : public Sexpr
{
Sexpr* left;
Sexpr* right;
};
struct IntAtom : Sexpr
{
int value;
};
And one subclass of Sexpr for each kind of Scheme Atom
, or something along those lines.
I'm not sure, but this seems like a hack to me... Shouldn't this work be done by an interpreter rather than the reader?
What I want to know is if this is considered the best (or correct) way of reading S-expressions, or is this more a job of the interpreter than the parser? Should the parser have its own AST instead of relying on cons cells?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
从Scheme/Racket方面跟进:
Racket(以及其他一些Scheme实现)对语法对象使用更丰富的表示,以便它们可以附加属性来指示(至少在Racket中)它们的上下文。重新绑定,它们来自什么源位置,编译器插入它们的过程,以及您可能想要存储的任何其他信息(参见 Racket 中的“语法属性”)。
这些附加信息可以实现诸如带有源指针的错误消息和卫生宏之类的功能。
请注意,我在这里所说的“更丰富”只是“包含更多价值”的意思,而不是任何非价值中立的方式。
我还应该补充一点——在陷入图灵焦油坑之前——你也可以使用侧面的表格来表示这个完全相同的信息;假设您进行了指针比较,则将值放入结构中与使用表将结构与值关联起来之间没有表现力差异。
To follow up from the Scheme/Racket side of the fence:
Racket (and some other Scheme implementations) use a richer representation for syntax objects, so that they can have properties attached to them indicating (in Racket, at least) what context they're bound in, what source location they come from, what pass of the compiler inserted them, and any other information you might want to store (cf. "syntax properties" in Racket).
This additional information enables things like error messages with pointers to source, and hygienic macros.
Please note that I mean "richer" here simply in the "contains more values" sense, not in any non-value-neutral way.
I should also add---before falling into the Turing Tar Pit---that you can also represent this exact same information using a table on the side; assuming you have pointer comparisons, there's no expressiveness difference between putting a value inside a structure and using a table to associate the structure with the value.
如果您希望语法更加完整,则需要支持
But that 比您遇到的大多数 sexpr 更通用。 S-expr 最简单和最常见的形式在 LISP/Scheme 中类似于 (abcd),其中每个 a、b、c、d 都是原子或列表。在成对形式中,这是 [a [b [c [d nil] ] ] ],这意味着 conses 的所有右侧都是列表。
所以如果你想要干净,你可能会这样做
If you want to be somewhat complete in your syntax, you will need to support
But that is more general than most sexpr you will encounter. The easiest and most common form of S-expr which in LISP/Scheme is like (a b c d) where each of a,b,c,d are atoms or lists. In pair form this is [a [b [c [d nil] ] ] ], which means all right sides of your conses are lists.
So if you are going for clean, you might just do
虽然人们可能会反复争论什么是“正确”的方法,但在我看来,您建议的方法(使用相同的数据结构进行读取、编译、评估、和处理)是这本书将教会你最多关于 Lisp 和“代码就是数据”口号的内容,特别是
quote
运算符的实际含义(这是一件非常深刻的事情)。顺便说一句,这也是大多数 Lisp(有趣的是,不包括Scheme)传统上的工作方式。
所以,是的,让读者生成 Lisp 数据:conses、符号、Lisp 数字、字符串等等,与用户级 Lisp 代码将处理的内容完全相同。这将使其余的实现变得更简单、更有启发性。
While one can probably argue back and forth over what the “correct” approach is, in my opinion, the approach you suggest—using the same data structures for reading, compilation, evaluation, and processing—is the one that will teach you the most about what Lisp and the “code is data” mantra are about, and in particular, what the
quote
operator actually means (which is quite a profound thing).It is also, incidentally, the way most Lisps (interestingly, not including Scheme) traditionally work.
So yes, have the reader generate Lisp data: conses, symbols, Lisp numbers, strings, et cetera, the exact same stuff the user-level Lisp code will deal with. It will make the rest of the implementation both simpler and more instructive.
您可能想查看 这个 c/c++ s-expr 解析器库 作为示例它已经完成了。
看起来基本表示是:
我引用他们的文档:
另外这里是许多语言中 s-expr 阅读器的其他实现的完整列表,这些语言可能是的兴趣。
You might want to take a look at this c/c++ s-expr parser library for an example of how it has been done.
It looks like the base representation is:
And I quote from their docs:
Additionally here is a whole list of other implementations of s-expr readers in lots of languages that may be of interest.