当前位置：文江博客话题详情

是否有允许语法抽象的非 Lisp 方言？

发布于 2024-11-17 06:48:36 字数 75 浏览 10 评论 0原文

正如 Rich Hickey 所说，Lisp 语言的秘密武器是能够通过宏直接操作抽象语法树。这可以在任何非 Lisp 方言语言中实现吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风和你 2024-11-24 06:48:36

能够“直接操作抽象语法树”本身并不是什么新鲜事，尽管很少有语言具备这种能力。例如，现在许多语言都有某种 eval 函数 - 但很明显，这不是操作抽象语法树相反，它是对具体语法（直接源代码）的操作。顺便说一句，D 中提到的功能属于同一类别，CPP 也是如此：两者都处理原始源文本。

要给出确实具有该功能的语言示例（但不被视为适当的宏），请参阅 OCaml。它有一个语法扩展系统 CamlP4，它本质上是一个编译器扩展工具包，它围绕OCaml 抽象语法作为其最重要的目的。但这仍然不是 Lisps 中相应功能如此出色的原因。

Lisps 的重要特征是，使用宏获得的扩展是语言的一部分，就像任何其他语法形式一样。换句话说，当您在 Lisp 中使用类似 if 的东西时，无论它是作为宏实现还是作为原始形式实现，功能上都没有区别。（实际上有一个细微的区别：在某些情况下，了解不会进一步扩展的原始形式集很重要。）更具体地说，Lisp 库可以提供简单的绑定和宏，这意味着与大多数语言中常见的无聊扩展相比，库可以以更有趣的方式扩展语言，只能添加简单的绑定（函数和值）。

现在，从这个角度来看，像D设施这样的东西本质上非常相似。但它处理原始文本而不是 AST 的事实限制了它的实用性。如果您查看该页面上的示例，

mixin(GenStruct!("Foo", "bar"));

您会发现这看起来不像该语言的一部分 - 为了使其更像 Lisp，您可以以自然的方式使用它：

GenStruct(Foo, bar);

不需要 mixin 关键字来标记宏的使用位置，不需要 !，并且标识符被指定为标识符而不是字符串。更好的是，定义应该更自然地表达，例如（在这里发明一些糟糕的语法）：

template expression GenStruct(identifier Name, identifier M1) {
    return [[struct $Name$ { int $M1$; }; ]]
}

这里需要注意的一件重要事情是，由于 D 是一种静态类型语言，AST 已经以一种明确的方式潜入了这种心理练习中——作为 identifier 和 expression 类型（我假设 template 将其标记为宏定义，但它仍然需要返回类型）。

在 Lisp 中，您本质上得到的是非常接近这个功能的东西，而不是糟糕的字符串解决方案。但你会得到更多——Lisp 有意双关基本列表类型，并以一种非常简单的方式将 AST 与运行时语言统一起来：AST 由符号、列表和其他基本文字（数字、字符串、布尔值）组成，这些都是运行时语言的一部分。事实上，对于这些文字，Lisp 又向前迈出了一步，并使用这些文字作为自己的语法——例如，数字 123（运行时存在的值）由以下语法表示：也是数字123（但现在它是编译时存在的值）。最重要的是，Lisp 中与宏相关的代码往往比其他语言所谓的“宏”更容易处理。例如，想象一下，让 D 示例代码在结构中创建 N int 字段（其中 N 是宏的新输入）——这需要使用某些函数将字符串转换为数字。

Being able to "directly manipulate the abstract syntax tree" by itself is nothing new, though it's something that very few languages have. For example, many languages these days have some kind of an eval function -- but it should be obvious that that's not manipulating the abstract syntax tree, instead, it is a manipulation of the concrete syntax -- the direct source code. Incidentally, the mentioned functionality in D falls under the same category, as is CPP: both deal with raw source text.

To give an example of a language that does have that feature (but not something that would be considered macros proper), see OCaml. It has a syntactic extension system, CamlP4, which is essentially a compiler extension toolkit, and it revolves around the OCaml abstract syntax as its most important purpose. But this is still not what makes the corresponding feature in Lisps so great.

The important feature of Lisps is that the extensions that you get using macros are part of the language in the same way that any other syntactic form is. To put this differently, when you use something like if in a Lisp, there is no difference in functionality whether it's implemented as a macro or as a primitive form. (Actually there is a minor difference: in some cases it's important to know the set of primitive forms that don't expand further.) More specifically, a Lisp library can provide plain bindings and macros, which means that libraries can extend the language in a much more interesting way than the usual boring extensions you get in most languages, capable of adding only plain bindings (functions and values).

Now, viewed in this light, something like the D facility is very similar in nature. But the fact that it deals with raw text rather than ASTs limit its utility. If you look at the example on that page,

mixin(GenStruct!("Foo", "bar"));

you can see how this doesn't look like part of the language -- to make it more like Lisp, you'd use it in a natural way:

GenStruct(Foo, bar);

with no need for a mixin keyword that marks where a macro is used, no need for that !, and the identifiers being specified as identifiers rather than strings. Even better, the definition should be expressed more naturally, something like (inventing some bad syntax here):

template expression GenStruct(identifier Name, identifier M1) {
    return [[struct $Name$ { int $M1$; }; ]]
}

One important thing to note here is that since D is a statically typed language, ASTs have crept into this mental exercise in an explicit way -- as the identifier and expression types (I'm assuming here that template marks this as a macro definition, but it still needs a return type).

In Lisp, you're essentially getting something very close to this functionality, rather than the poor string solution. But you get even more -- Lisp intentionally puns over the basic list type, and unifies the ASTs with the runtime language in a very simple way: the AST is made of symbols and lists and other basic literals (numbers, strings, booleans), and those are all part of the runtime language. In fact, for those literals, Lisp takes another step forward, and uses the literals as their own syntax -- for example, the number 123 (a value that exists at runtime) is represented by a syntax which is also the number 123 (but now it's a value that exists at compile-time). The bottom line of this is that macro-related code in Lisp tends to be far easier to deal with than what other languages call "macro"s. Imagine, for example, making the D example code create N int fields in a struct (where N is a new input to the macro) -- that would require using some function to translate a string into a number.

回复收藏 0 原文

如果没有 2024-11-24 06:48:36

Lisp

LISP“特殊”的原因是...

内置功能非常经济：

唯一的内置数据结构是原子或列表
语法是根据列表数据实现的结构
“系统函数”非常少

它支持函数的方式是新函数定义与内置函数无法区分：

调用语法相同
参数的计算可以完全控制

它支持宏的方式是任意 Lisp代码总是可以根据a来定义领域特定语言：

调用语法就像自定义函数调用语法，就像内置函数调用语法一样
参数的求值完全可控
可以生成任意 Lisp 代码
宏在运行时求值，因此宏的实现可以在生成新代码的同时调用现有代码

借助上述功能，您可以：

用很少的代码在 Lisp 中重新实现 Lisp 以
与内置功能无法区分的方式添加任何现有的编程习惯

例如。你可以轻松地在 Lisp 之上实现命名空间、任何数据结构、类、多态性和多重调度系统，并且这些功能将像内置在 Lisp 中一样工作。

其他语言

但这完全取决于您的定义。其他语言以多种不同的方式支持某些级别的“句法抽象”。其中一些方法比其他方法更强大，并且几乎与 Lisp 的灵活性相匹配。

一些示例：

Boo（语法宏和其他功能）
http://boo.codehaus.org/
Boo Wikipedia 条目（用于功能列表）

在 Boo 中，您可以使用语法宏定义将由编译器自动处理的新 DSL。这样，您就可以在现有功能的基础上实现任何语言功能。与 Lisp 相比，其局限性在于它们是在编译时评估的，因此不直接支持运行时代码生成。

Javascript/Lua（数组内函数的动态求值、原型和 . 表示法）
http://en.wikipedia.org/wiki/JavaScript#Prototype-based

在 Javascript 中，数据结构是通用且灵活的（所有内容要么是内置类型，要么是关联数组）。它还支持直接从关联数组调用函数。这样，您可以在现有功能（例如类和命名空间）之上实现多种语言功能。

因为 Javascript 是一种动态语言（函数调用的名称在运行时评估），并且因为它公开了数据结构上下文中的内置功能，所以它是完全“反射”且完全可变的。

因此，您可以用自己的功能替换或填充现有的系统功能。这对于在您自己的运行时调试功能中进行填充或沙箱（通过取消定义您不希望独立代码访问的系统调用）通常非常有用。

Lua 在大多数方面与 Javascript 非常相似。

C/C++ 预处理器宏
http://en.wikipedia.org/wiki/C_preprocessor

C++ 预处理器允许您使用与现有函数调用有些相似的语法定义您自己的 DSL。它不允许您控制评估（这是许多错误的根源，也是为什么大多数人说 C/C++ 宏是“邪恶的”），但它确实支持某种有限形式的代码一代。

C/C++ 宏中的代码生成支持是有限的，因为宏是在编译代码之前评估的，并且无法通过 C 代码进行控制。它几乎完全局限于文本替换。这极大地限制了可以生成的代码类型。

C++ 模板
http://en.wikipedia.org/wiki/Template_metaprogramming

C++ 模板功能相当强大（ WRT 到 C/C++ 宏）用于对语言进行语法添加。它可以将大量运行时代码评估转变为编译时代码评估，并且可以对现有代码进行静态断言。它可以以有限的方式引用现有的 C++ 代码。

但模板元编程（TMP）非常笨重，因为它的语法很糟糕，是 C++ 的一个非常严格限制的子集，代码生成能力相当有限，并且无法在运行时评估。 C++ 模板还可以输出您在编程中遇到的最困难的错误消息:)

请注意，这并没有阻止模板元编程成为许多社区的活跃研究领域。请参阅 boost 项目，其中很大一部分致力于 TMP 支持库和 TMP 实现的库。

Python（和许多其他语言）中的 Duck-Typing
http://en.wikipedia.org/wiki/Duck_typing

鸭子类型可以让你定义一个对象的语法允许您在运行时替换实现。这类似于 Javascript 在关联数组上定义函数的方式。

我不能说Python（因为我不太了解它），但鸭子类型通常比Javascript的动态功能更受限制，因为缺乏反射性、可变性以及通过可反射/可变接口暴露系统功能。例如，C# 的鸭子类型在所有这些方面都受到限制。

Lisp

The reasons LISP is "special" are...

The built-in functionality is very economical:

The only built-in data structures are atoms, or lists
The syntax is implemented in terms of the list data structure
There are very few "system functions"

It supports functions in such a way that new function definitions are indistinguishable from built-in functions:

The calling syntax is identical
Evaluation of arguments can be fully controlled

It supports macros in such a way that arbitrary Lisp code can always be defined in terms of a domain-specific language:

The calling syntax is just like custom function-call syntax, which is just like built-in function-call syntax
Evaluation of arguments is completely controllable
Arbitrary Lisp code-generation is possible
Macros are evaluated at runtime, so the macro's implementation can call existing code while generating new code

With the above features, you can:

Re-implement Lisp-within-Lisp, in very little code
Add any existing programming idioms in a way that is indistinguishable from built-in features

E.g. you can easily implement systems for namespaces, any data structure, classes, polymorphism, and multiple-dispatch on top of Lisp, and such features will work like they were built into Lisp.

Other languages

But it all depends on your definition. Some levels of "syntactic abstraction" are supported in other languages in quite varied ways. Some of these ways are more powerful than others, and nearly match Lisp's flexibility.

Some examples:

Boo (syntactic macros, and other features)
http://boo.codehaus.org/
Boo Wikipedia Entry (for the feature list)

In Boo, you can use syntactic macros to define new DSLs that will automatically be handled by the compiler. With this, you can implement any language feature on top of existing features. The limitation compared to Lisp is that these are evaluated at compile time, so run-time code generation isn't directly supported.

Javascript/Lua (dynamic evalution, prototypes, and . notation for functions inside arrays)
http://en.wikipedia.org/wiki/JavaScript#Prototype-based

In Javascript, the data structures are generic and flexible (everything is either a built-in type, or an associative array). It also supports invoking functions directly from associative arrays. With this, you can implement several language features on top of existing features, such as classes and namespaces.

Because Javascript is a dynamic language (names of function calls are evaluated at runtime), and because it exposes built-in features within the context of data structures, it is fully "reflective" and fully mutable.

Because of this, you can replace or shim the existing system functionality with your own functionality. This is often quite useful in shimming in your own runtime debugging features, or for sand-boxing (by un-defining system calls you don't want isolated code to access).

Lua is quite similar to Javascript in most of these ways.

C/C++ Pre-processor macros
http://en.wikipedia.org/wiki/C_preprocessor

The C++ pre-processor allows you to define your own DSL with a somewhat similar syntax to existing function calls. It does not let you control evaluation (which is the source of a lot of bugs, and why most people say C/C++ macros are "Evil"), but it does support a somewhat limited form of code generation.

The code generation support in C/C++ macros is limited because macros are evaluated before your code is compiled, and can't be controlled via C code. It is nearly completely limited to textual substitution. This greatly limits the type of code that can be generated.

C++ Templates
http://en.wikipedia.org/wiki/Template_metaprogramming

The C++ template feature is quite powerful (WRT to C/C++ macros) for syntactical additions to the language. It can turn a lot of runtime code evaluation into compile-time code evalution, and can do static assertions on your existing code. It can reference existing C++ code, in a limited way.

But template meta-programming (TMP) is very unwieldy because it has a terrible syntax, is a very strictly limited subset of C++, has quite limited code generation ability, and can't be evaluated at runtime. C++ templates also arguably output the most difficult error messages you will ever encounter in programming :)

Note that this hasn't kept template meta-programming from being an active area of research in many communities. See the boost project, of which a good portion is devoted to TMP-support libraries, and TMP-implemented libraries.

Duck-Typing in Python (and many other languages)
http://en.wikipedia.org/wiki/Duck_typing

Duck typing can allow you to define a syntax on objects that lets you substitute implementations at runtime. This is similar to how Javascript defines functions on associative arrays.

I can't say for Python (since I don't know it very well), but duck typing is often more limited than Javascript's dynamic features because of a lack of reflectivity, mutability, and exposure of system functionality through reflectable/mutable interfaces. For example, C#'s duck typing is limited in all these ways.

回复收藏 0 原文