是否有允许语法抽象的非 Lisp 方言?
正如 Rich Hickey 所说,Lisp 语言的秘密武器是能够通过宏直接操作抽象语法树。这可以在任何非 Lisp 方言语言中实现吗?
As Rich Hickey says, the secret sauce of Lisp languages is the ability to directly manipulate the Abstract Syntax Tree through macros. Can this be achieved in any non-Lisp dialect languages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
能够“直接操作抽象语法树”本身并不是什么新鲜事,尽管很少有语言具备这种能力。例如,现在许多语言都有某种
eval
函数 - 但很明显,这不是操作抽象语法树相反,它是对具体语法(直接源代码)的操作。顺便说一句,D 中提到的功能属于同一类别,CPP 也是如此:两者都处理原始源文本。要给出确实具有该功能的语言示例(但不被视为适当的宏),请参阅 OCaml。它有一个语法扩展系统CamlP4,它本质上是一个编译器扩展工具包,它围绕OCaml 抽象语法作为其最重要的目的。但这仍然不是 Lisps 中相应功能如此出色的原因。
Lisps 的重要特征是,使用宏获得的扩展是语言的一部分,就像任何其他语法形式一样。换句话说,当您在 Lisp 中使用类似
if
的东西时,无论它是作为宏实现还是作为原始形式实现,功能上都没有区别。 (实际上有一个细微的区别:在某些情况下,了解不会进一步扩展的原始形式集很重要。)更具体地说,Lisp 库可以提供简单的绑定和宏,这意味着与大多数语言中常见的无聊扩展相比,库可以以更有趣的方式扩展语言,只能添加简单的绑定(函数和值)。现在,从这个角度来看,像D设施这样的东西本质上非常相似。但它处理原始文本而不是 AST 的事实限制了它的实用性。如果您查看该页面上的示例,
您会发现这看起来不像该语言的一部分 - 为了使其更像 Lisp,您可以以自然的方式使用它:
不需要
mixin
关键字来标记宏的使用位置,不需要!
,并且标识符被指定为标识符而不是字符串。更好的是,定义应该更自然地表达,例如(在这里发明一些糟糕的语法):这里需要注意的一件重要事情是,由于 D 是一种静态类型语言,AST 已经以一种明确的方式潜入了这种心理练习中——作为
identifier
和expression
类型(我假设template
将其标记为宏定义,但它仍然需要返回类型) 。在 Lisp 中,您本质上得到的是非常接近这个功能的东西,而不是糟糕的字符串解决方案。但你会得到更多——Lisp 有意双关基本列表类型,并以一种非常简单的方式将 AST 与运行时语言统一起来:AST 由符号、列表和其他基本文字(数字、字符串、布尔值)组成,这些都是运行时语言的一部分。事实上,对于这些文字,Lisp 又向前迈出了一步,并使用这些文字作为自己的语法——例如,数字 123(运行时存在的值)由以下语法表示: 也是数字
123
(但现在它是编译时存在的值)。最重要的是,Lisp 中与宏相关的代码往往比其他语言所谓的“宏”更容易处理。例如,想象一下,让 D 示例代码在结构中创建 Nint
字段(其中 N 是宏的新输入)——这需要使用某些函数将字符串转换为数字。Being able to "directly manipulate the abstract syntax tree" by itself is nothing new, though it's something that very few languages have. For example, many languages these days have some kind of an
eval
function -- but it should be obvious that that's not manipulating the abstract syntax tree, instead, it is a manipulation of the concrete syntax -- the direct source code. Incidentally, the mentioned functionality in D falls under the same category, as is CPP: both deal with raw source text.To give an example of a language that does have that feature (but not something that would be considered macros proper), see OCaml. It has a syntactic extension system, CamlP4, which is essentially a compiler extension toolkit, and it revolves around the OCaml abstract syntax as its most important purpose. But this is still not what makes the corresponding feature in Lisps so great.
The important feature of Lisps is that the extensions that you get using macros are part of the language in the same way that any other syntactic form is. To put this differently, when you use something like
if
in a Lisp, there is no difference in functionality whether it's implemented as a macro or as a primitive form. (Actually there is a minor difference: in some cases it's important to know the set of primitive forms that don't expand further.) More specifically, a Lisp library can provide plain bindings and macros, which means that libraries can extend the language in a much more interesting way than the usual boring extensions you get in most languages, capable of adding only plain bindings (functions and values).Now, viewed in this light, something like the D facility is very similar in nature. But the fact that it deals with raw text rather than ASTs limit its utility. If you look at the example on that page,
you can see how this doesn't look like part of the language -- to make it more like Lisp, you'd use it in a natural way:
with no need for a
mixin
keyword that marks where a macro is used, no need for that!
, and the identifiers being specified as identifiers rather than strings. Even better, the definition should be expressed more naturally, something like (inventing some bad syntax here):One important thing to note here is that since D is a statically typed language, ASTs have crept into this mental exercise in an explicit way -- as the
identifier
andexpression
types (I'm assuming here thattemplate
marks this as a macro definition, but it still needs a return type).In Lisp, you're essentially getting something very close to this functionality, rather than the poor string solution. But you get even more -- Lisp intentionally puns over the basic list type, and unifies the ASTs with the runtime language in a very simple way: the AST is made of symbols and lists and other basic literals (numbers, strings, booleans), and those are all part of the runtime language. In fact, for those literals, Lisp takes another step forward, and uses the literals as their own syntax -- for example, the number
123
(a value that exists at runtime) is represented by a syntax which is also the number123
(but now it's a value that exists at compile-time). The bottom line of this is that macro-related code in Lisp tends to be far easier to deal with than what other languages call "macro"s. Imagine, for example, making the D example code create Nint
fields in a struct (where N is a new input to the macro) -- that would require using some function to translate a string into a number.Lisp
LISP“特殊”的原因是...
内置功能非常经济:
它支持函数的方式是新函数定义与内置函数无法区分:
它支持宏的方式是任意 Lisp代码总是可以根据a来定义领域特定语言:
借助上述功能,您可以:
例如。你可以轻松地在 Lisp 之上实现命名空间、任何数据结构、类、多态性和多重调度系统,并且这些功能将像内置在 Lisp 中一样工作。
其他语言
但这完全取决于您的定义。其他语言以多种不同的方式支持某些级别的“句法抽象”。其中一些方法比其他方法更强大,并且几乎与 Lisp 的灵活性相匹配。
一些示例:
http://boo.codehaus.org/
Boo Wikipedia 条目(用于功能列表)
在 Boo 中,您可以使用语法宏定义将由编译器自动处理的新 DSL。这样,您就可以在现有功能的基础上实现任何语言功能。与 Lisp 相比,其局限性在于它们是在编译时评估的,因此不直接支持运行时代码生成。
.
表示法)http://en.wikipedia.org/wiki/JavaScript#Prototype-based
在 Javascript 中,数据结构是通用且灵活的(所有内容要么是内置类型,要么是关联数组)。它还支持直接从关联数组调用函数。这样,您可以在现有功能(例如类和命名空间)之上实现多种语言功能。
因为 Javascript 是一种动态语言(函数调用的名称在运行时评估),并且因为它公开了数据结构上下文中的内置功能,所以它是完全“反射”且完全可变的。
因此,您可以用自己的功能替换或填充现有的系统功能。这对于在您自己的运行时调试功能中进行填充或沙箱(通过取消定义您不希望独立代码访问的系统调用)通常非常有用。
Lua 在大多数方面与 Javascript 非常相似。
http://en.wikipedia.org/wiki/C_preprocessor
C++ 预处理器允许您使用与现有函数调用有些相似的语法定义您自己的 DSL。它不允许您控制评估(这是许多错误的根源,也是为什么大多数人说 C/C++ 宏是“邪恶的”),但它确实支持某种有限形式的代码一代。
C/C++ 宏中的代码生成支持是有限的,因为宏是在编译代码之前评估的,并且无法通过 C 代码进行控制。它几乎完全局限于文本替换。这极大地限制了可以生成的代码类型。
http://en.wikipedia.org/wiki/Template_metaprogramming
C++ 模板功能相当强大( WRT 到 C/C++ 宏)用于对语言进行语法添加。它可以将大量运行时代码评估转变为编译时代码评估,并且可以对现有代码进行静态断言。它可以以有限的方式引用现有的 C++ 代码。
但模板元编程(TMP)非常笨重,因为它的语法很糟糕,是 C++ 的一个非常严格限制的子集,代码生成能力相当有限,并且无法在运行时评估。 C++ 模板还可以输出您在编程中遇到的最困难的错误消息:)
请注意,这并没有阻止模板元编程成为许多社区的活跃研究领域。请参阅 boost 项目,其中很大一部分致力于 TMP 支持库和 TMP 实现的库。
http://en.wikipedia.org/wiki/Duck_typing
鸭子类型可以让你定义一个对象的语法允许您在运行时替换实现。这类似于 Javascript 在关联数组上定义函数的方式。
我不能说Python(因为我不太了解它),但鸭子类型通常比Javascript的动态功能更受限制,因为缺乏反射性、可变性以及通过可反射/可变接口暴露系统功能。例如,C# 的鸭子类型在所有这些方面都受到限制。
Lisp
The reasons LISP is "special" are...
The built-in functionality is very economical:
It supports functions in such a way that new function definitions are indistinguishable from built-in functions:
It supports macros in such a way that arbitrary Lisp code can always be defined in terms of a domain-specific language:
With the above features, you can:
E.g. you can easily implement systems for namespaces, any data structure, classes, polymorphism, and multiple-dispatch on top of Lisp, and such features will work like they were built into Lisp.
Other languages
But it all depends on your definition. Some levels of "syntactic abstraction" are supported in other languages in quite varied ways. Some of these ways are more powerful than others, and nearly match Lisp's flexibility.
Some examples:
http://boo.codehaus.org/
Boo Wikipedia Entry (for the feature list)
In Boo, you can use syntactic macros to define new DSLs that will automatically be handled by the compiler. With this, you can implement any language feature on top of existing features. The limitation compared to Lisp is that these are evaluated at compile time, so run-time code generation isn't directly supported.
.
notation for functions inside arrays)http://en.wikipedia.org/wiki/JavaScript#Prototype-based
In Javascript, the data structures are generic and flexible (everything is either a built-in type, or an associative array). It also supports invoking functions directly from associative arrays. With this, you can implement several language features on top of existing features, such as classes and namespaces.
Because Javascript is a dynamic language (names of function calls are evaluated at runtime), and because it exposes built-in features within the context of data structures, it is fully "reflective" and fully mutable.
Because of this, you can replace or shim the existing system functionality with your own functionality. This is often quite useful in shimming in your own runtime debugging features, or for sand-boxing (by un-defining system calls you don't want isolated code to access).
Lua is quite similar to Javascript in most of these ways.
http://en.wikipedia.org/wiki/C_preprocessor
The C++ pre-processor allows you to define your own DSL with a somewhat similar syntax to existing function calls. It does not let you control evaluation (which is the source of a lot of bugs, and why most people say
C/C++ macros are "Evil"
), but it does support a somewhat limited form of code generation.The code generation support in C/C++ macros is limited because macros are evaluated before your code is compiled, and can't be controlled via C code. It is nearly completely limited to textual substitution. This greatly limits the type of code that can be generated.
http://en.wikipedia.org/wiki/Template_metaprogramming
The C++ template feature is quite powerful (WRT to C/C++ macros) for syntactical additions to the language. It can turn a lot of runtime code evaluation into compile-time code evalution, and can do static assertions on your existing code. It can reference existing C++ code, in a limited way.
But template meta-programming (TMP) is very unwieldy because it has a terrible syntax, is a very strictly limited subset of C++, has quite limited code generation ability, and can't be evaluated at runtime. C++ templates also arguably output the most difficult error messages you will ever encounter in programming :)
Note that this hasn't kept template meta-programming from being an active area of research in many communities. See the boost project, of which a good portion is devoted to TMP-support libraries, and TMP-implemented libraries.
http://en.wikipedia.org/wiki/Duck_typing
Duck typing can allow you to define a syntax on objects that lets you substitute implementations at runtime. This is similar to how Javascript defines functions on associative arrays.
I can't say for Python (since I don't know it very well), but duck typing is often more limited than Javascript's dynamic features because of a lack of reflectivity, mutability, and exposure of system functionality through reflectable/mutable interfaces. For example, C#'s duck typing is limited in all these ways.
为了完整起见,除了已经提到的语言和预处理器之外:
For a sake of completeness, in addition to the already mentioned languages and preprocessors:
d
我不确定您是否会打电话它本身就是“语法抽象”,但它确实可以做 Lisp 能做的很多事情:
mixin
关键字可让您将字符串转换为代码(以比 C 宏更好的方式),当与模板(比 C++ 中的模板好得多)结合使用时,您几乎可以做任何您想做的事情。d
I'm not sure if you'd call it "syntactic abstraction" per se, but it certainly can do much of what Lisp can do:
The
mixin
keyword lets you convert a string into code (in a much better manner than C macros), which, when combined with templates (which are much better than those in C++) you can do pretty much anything you want.Prolog 就是这样一种语言。 Prolog 有很多方言。一个想法是它们的基本构建块是一个术语(类似于编码函数的 s 表达式)。有一些解析器为此提供了宏设施。
Prolog would be such a language. There are many Prolog dialects. One idea is that their basic building block is a term (similar to an s-expression encoding a function). There are parsers that provide macro facilities for that.
我想说 Tcl 符合条件——好吧,取决于你是否认为 Tcl 是 Lisp。
标准分组字符
{
}
实际上只是一个字符串文字(没有变量插值),并且有一个eval
,因此您可以轻松定义您自己的控制流或循环语法(人们经常这样做)。I would say Tcl qualifies -- well, depending on whether you consider Tcl a Lisp or not.
The standard grouping characters
{
}
are actually just a string literal (with no variable interpolation), and there's aneval
, so you can easily define your own control flow or looping syntax (and people often do).