如何在 Lisp 编译器中编译宏？

发布于 2024-11-29 18:01:37 字数 657 浏览 1 评论 0原文

在 Lisp 解释器中，eval 中很容易有一个分支可以扩展宏，并且在扩展宏的过程中，调用函数来构建扩展的表达式。我在使用低级宏之前已经这样做过，它很容易想到。

但是，在编译器中，没有任何函数可以调用来构建扩展代码：在以下示例中可以很简单地看出问题：

(defmacro cube (n)
    (let ((x (gensym)))
      `(let ((,x ,n))
          (* ,x ,x ,x))))

当解释器扩展宏时，它会调用gensym 并执行您所期望的操作。当由编译器扩展时，您将生成将 x 绑定到 (gensym) 的 let 代码，但 gensymmed 符号只是必需的让编译器做正确的事情。而且由于在编译宏之前实际上并未调用 gensym ，因此它不是很有用。

当宏使用 map 或 filter 构建一个列表以用作扩展时，这对我来说更加奇怪。

那么这是如何运作的呢？当然，编译后的代码不会编译为 (eval *macro-code*) 因为那效率非常低。有没有一个写得好的 Lisp 编译器可以清楚地说明这一点？

原文

In a Lisp interpreter, there can easily be a branch in eval that can expand a macro, and in the process of expanding it, call functions to build up the expanded expression. I've done this before using low-level macros, it's easily concieved.

But, in a compiler there aren't any functions to call to build up the expanded code: The issue can be seen quite simply in the following example:

(defmacro cube (n)
    (let ((x (gensym)))
      `(let ((,x ,n))
          (* ,x ,x ,x))))

When the macro is expanded by an interpreter, it calls gensym and does what you expect. When expanded by a compiler, you'd generate the code for a let which binds x to (gensym) but the gensymmed symbol is only necessary for the compiler to do the right thing. And since gensym isn't actually called before the macro is compiled, it's not very useful.

This gets even more strange to me when a macro builds up a list to be used as the expansion using map or filter.

So how does this work? Surely the compiled code isn't compiled to (eval *macro-code*) because that'd be horribly inefficient. Is there a well written Lisp compiler where this is clear?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风尘浪孓 2024-12-06 18:01:37

在不同的 Lisp 方言中，其工作原理有很大不同。对于 Common Lisp，它在 ANSI Common Lisp 标准中进行了标准化，并且各种 Common Lisp 实现无论使用编译器、解释器还是两者都使用，差异很大。

以下假设采用 Common Lisp。

EVAL 不是解释器。 EVAL可以用编译器来实现。一些 Common Lisp 实现甚至没有解释器。那么EVAL就是调用编译器来编译代码，然后调用编译后的代码。这些实现使用增量编译器，它还可以编译简单的表达式，例如 2、(+ 2 3)、(gensym) 等。

宏展开是通过函数 MACROEXPAND 和 MACROEXPAND-1 完成的。

Common Lisp 中的宏是一个需要某种形式并返回另一种形式的函数。 DEFMACRO 将此函数注册为宏。

你的宏

(defmacro cube (n)
  (let ((x (gensym)))
    `(let ((,x ,n))
        (* ,x ,x ,x))))

只不过是一个 Lisp 函数，它被注册为宏。

效果与此类似：

(defun cube-internal (form environment)
  (destructuring-bind (name n) form   ; the name would be CUBE
    (let ((x (gensym)))
      `(let ((,x ,n))
         (* ,x ,x ,x)))))

(setf (macro-function 'my-cube) #'cube-internal)

在真实的 CL 实现中，DEFMACRO 的扩展方式不同，并且不使用 CUBE-INTERNAL 这样的名称。但从概念上讲，它是定义一个宏函数并注册它。

当Lisp编译器看到宏定义时，它通常会编译该宏函数并将其存储在当前所谓的环境中。如果环境是运行时环境，则在运行时被记住。如果编译文件时环境是编译环境，则文件编译后宏会被忘记。需要加载编译后的文件，以便 Lisp 知道宏。

因此，定义宏并编译它会产生副作用。编译器会记住编译后的宏并存储其代码。

当编译器现在看到一些使用宏(cube 10)的代码时，编译器只会调用存储在中的宏函数当前环境以CUBE名称，调用以10为参数的宏函数，然后编译生成的表单。如上所述，它不是直接完成的，而是通过 MACROEXPAND 函数完成的。

这是宏定义：

CL-USER 5 > (defmacro cube (n)
              (let ((x (gensym)))
                `(let ((,x ,n))
                   (* ,x ,x ,x))))
CUBE

我们编译宏：

CL-USER 6 > (compile 'cube)
CUBE
NIL
NIL

MACRO-FUNCTION 返回宏的函数。我们可以像调用任何其他函数一样使用 FUNCALL 调用它。它需要两个参数：一个像 (cube 10) 这样的完整形式和一个环境（这里是 NIL）。

CL-USER 7 > (funcall (macro-function 'cube) '(cube 10) nil)
(LET ((#:G2251 10)) (* #:G2251 #:G2251 #:G2251))

还可以采用一个函数（它接受两个参数：表单和环境）并使用 SETF 将其存储为宏函数。

总结

当 Common Lisp 编译器运行时，它只知道宏函数，并在需要时调用它们，通过内置的宏扩展器来扩展代码。宏函数本身就是 Lisp 代码。当Lisp编译器看到宏定义时，它会编译宏函数，将其存储在当前环境中，并用它来扩展宏的后续用途。

注意：这使得在 Common Lisp 中必须先定义宏，然后编译器才能使用它。

How this works is very different in various Lisp dialects. For Common Lisp it is standardized in the ANSI Common Lisp standard and the various Common Lisp implementations differ mostly whether they use a compiler, an interpreter or both.

The following assumes Common Lisp.

EVAL is not the interpreter. EVAL can be implemented with a compiler. Some Common Lisp implementations even don't have an interpreter. Then EVAL is a call to the compiler to compile the code and then calls the compiled code. These implementations use an incremental compiler, which can compile also simple expressions like 2, (+ 2 3), (gensym), and so on.

Macroexpansion is done with the functions MACROEXPANDand MACROEXPAND-1.

A macro in Common Lisp is a function that expects some forms and returns another form. DEFMACRO registers this function as a macro.

Your macro

(defmacro cube (n)
  (let ((x (gensym)))
    `(let ((,x ,n))
        (* ,x ,x ,x))))

is nothing but a Lisp function, which is registered as a macro.

The effect is similar to this:

(defun cube-internal (form environment)
  (destructuring-bind (name n) form   ; the name would be CUBE
    (let ((x (gensym)))
      `(let ((,x ,n))
         (* ,x ,x ,x)))))

(setf (macro-function 'my-cube) #'cube-internal)

In a real CL implementation DEFMACRO expands differently and does not use a name like CUBE-INTERNAL. But conceptually it is defining a macro function and registering it.

When the Lisp compiler sees a macro definition, it usually compiles the macro function and stores it in the current so-called environment. If the environment is the runtime environment, it is remembered at runtime. If the environment is the compiler environment while compiling a file, the macro is forgotten after the file is compiled. The compiled file needs to be loaded so that Lisp then knows the macro.

So, there is a side effect in defining a macro and compiling it. The compiler remembers the compiled macro and stores its code.

When the compiler now sees some code which uses the macro (cube 10), then the compiler just calls the macro function which is stored in the current environment under the name CUBE, calls this macro function which 10 as an argument, and then compiles the generated form. As mentioned above, it is not done directly, but via the MACROEXPAND functions.

Here is the Macro definition:

CL-USER 5 > (defmacro cube (n)
              (let ((x (gensym)))
                `(let ((,x ,n))
                   (* ,x ,x ,x))))
CUBE

We compile the macro:

CL-USER 6 > (compile 'cube)
CUBE
NIL
NIL

MACRO-FUNCTION returns the function for a macro. We can call it like any other function with FUNCALL. It expects two arguments: a whole form like (cube 10) and an environment (here NIL).

CL-USER 7 > (funcall (macro-function 'cube) '(cube 10) nil)
(LET ((#:G2251 10)) (* #:G2251 #:G2251 #:G2251))

It is also possible to take a function (which accepts two arguments: a form and an environment) and store it using SETF as a macro function.

Summary

When the Common Lisp compiler runs, it simply knows the macro functions and calls them when necessary to expand code via the built-in macro expander. The macro functions are simply Lisp code themselves. When the Lisp compiler sees a macro definition, it compiles the macro function, stores it in the current environment and uses it to expand subsequent uses of the macro.

Note: This makes it necessary in Common Lisp that a macro is defined before it can be used by the compiler.

回复收藏 0 原文

战皆罪 2024-12-06 18:01:37

对此有很多方法。一个极端是所谓的“FEXPER”，它是类似宏观的东西，本质上在每次评估时都会重新扩展。它们在过去的某个时候引起了很大的噪音，但现在几乎完全消失了。（不过，仍然有一些人在做类似的事情，newlisp 可能是最流行的例子。）

因此，FEXPER 被抛弃，取而代之的是宏，在某种程度上，宏更“表现良好”。您基本上会进行一次宏扩展，然后编译生成的代码。像往常一样，这里有一些策略，可能会导致不同的结果。例如，“展开一次”没有指定何时展开。这种情况可能会在代码被读取时发生，或者（通常）在编译时发生，甚至在第一次运行时发生。

这里的另一个问题——这本质上就是你的立场——是在什么环境中评估宏代码。在大多数 Lisp 中，一切都发生在同一个快乐的全局环境中。宏可以自由地访问函数，这可能会导致一些微妙的问题。这样做的结果之一是，许多商业 Common Lisp 实现为您提供了一个开发环境，您可以在其中完成大部分工作并编译内容 - 这使得在两个级别上都可以使用相同的环境。（实际上，由于宏可以使用宏，因此这里有任意数量的级别。）要部署应用程序，您将获得一个受限环境，例如，没有编译器（即 compile 函数），因为如果您部署使用该函数的代码，那么您的代码本质上是一个 CL 编译器。因此，我们的想法是在完整实现上编译代码，并扩展所有宏，这意味着编译后的代码没有额外使用宏。

当然，这可能会导致我谈到的那些微妙问题。例如，一些副作用可能会导致加载顺序混乱，您需要按特定顺序加载代码。更糟糕的是，您可能会陷入这样的陷阱：代码以一种方式运行，而在编译时以另一种方式运行——因为编译后的代码已经预先扩展了所有宏（以及它们所做的调用）。对于这些有一些黑客解决方案，例如 eval-when 指定评估某些代码的某些条件。还有一些 CL 包系统，您可以在其中指定加载顺序等内容（例如 asdf）。尽管如此，仍然没有真正强大的解决方案，您仍然可能陷入这些陷阱（例如，请参阅这个扩展的咆哮）。

当然，还有其他选择。最值得注意的是，Racket 使用其模块系统。一个模块可以被“实例化”多次，并且每个实例的状态都是唯一的。现在，当某个模块同时在宏和运行时使用时，该模块的两个实例是不同的，这意味着编译始终可靠，并且不存在上述令人头疼的问题。在Scheme世界中，这被称为“单独的阶段”，其中每个阶段（运行时、编译时和具有使用宏的宏的更高级别）都有单独的模块实例。有关此内容的详细介绍和详细解释，请阅读 Matthew Flatt 的可组合和可编译宏。您也可以查看 Racket 文档，例如编译和运行时阶段部分。

There are many approaches to this. One extreme is something called "FEXPER", which are macro-like things that essentially get re-expanded on every evaluation. They caused a lot of noise at some point in the past but have almost completely disappeared. (There are a few people who still do similar things though, newlisp is probably the most popular example.)

So FEXPERs were dumped in favor of macros, which are more "well behaved" in a way. You basically do macro expansion once, and compile the resulting code. As usual, there are a few strategies here, which can lead to different results. For example, "expand once" doesn't specify when it gets expanded. This can happen as soon as the code is read, or (usually) when it is compiled, or even just on the first time it runs.

Another question here -- and that's essentially where you stand -- is in what environment you evaluate the macro code. In most Lisps, everything happens in the same happy global environment. A macro can access functions freely, which can lead to some subtle problems. One outcome of this is that many commercial Common Lisp implementations give you a development environment where you do most of your work and compile things -- this makes the same environment available on both levels. (Actually, since macros can use macros, there are an arbitrary number of levels here.) To deploy an application you get a restricted environment that doesn't have, for example, the compiler (ie, the compile function), since if you deploy code that uses that, your code is essentially a CL compiler. So the idea is that you compile the code on your full implementation, and that expands all macros, which means that the compiled code has no additional uses of macros.

But of course that can lead to those subtle problems that I talked about. For example, some side-effects can lead to a loading-order mess, where you need to load code in a specific order. Worse, you could fall into a trap where code runs one way for you, and another way when it's compiled -- since compiled code already had all macros (and the calls they made) expanded beforehand. There are some hackish solutions to these, like eval-when that specifies certain conditions for evaluating some code. There are also a few package systems for CL where you specify things like loading order (like asdf). Still, there is no real robust solution there, and you can still fall into these traps (see for example this extended rant).

There are alternatives, of course. Most notably, Racket uses its module system. A module can be "instantiated" multiple times, and state is unique to each instance. Now, when some module is used in both macros and in runtime, the two instances of this modules are distinct, which means that compilation is always reliable, and there are none of the above headaches. In the Scheme world, this is known as "separate phases", where each phase (runtime, compile-time, and higher levels with macros-using-macros) has separate module instances. For a good introduction to this and a thorough explanation, read Matthew Flatt's Composable and Compilable Macros. You could also just look at the Racket docs, for example, the Compile and Run-Time Phases section.

回复收藏 0 原文

暮凉 2024-12-06 18:01:37

宏并没有什么特别神奇的地方。

在较高的层次上，它们只是函数。为 Lisp 形式返回 S-Exprs 的函数。宏的“运行时”在 Macroexpand 函数中可用，您可能已经知道，该函数扩展宏。

因此，您可以这样看待它：编译器检测到某个表单是宏，对其求值，然后编译作为该宏的结果返回的后续表单。

通常，宏中存在大量引用、拼接和其他列表操作，以使它们更易于编写，就像模板系统一样。但这些构造不是必需的。您可以根据需要返回构建的 S-Expr。因此，从这个角度来看，您可以看到它们的核心是在编译时评估的简单函数。

回复收藏 0 原文

孤星 2024-12-06 18:01:37

您发现了 Lisp 与其他语言之间的主要区别之一。

在 Lisp 中，动态创建的代码的执行是必不可少的，例如宏扩展所必需的。

在编写一个 Lisp 到 C 编译器时，我自己发现了这个显而易见的事情，并得出结论：如果你想编写一个 Lisp 编译器，只有两种解决方案：

你同时编写一个编译器和一个解释器，以便你可以调用编译期间进行宏扩展的解释器。
您必须能够动态编译代码并调用它（或者使用更糟糕的“技巧”，例如编译可动态加载的模块然后加载它）。

如果您正在开发 C 编译器，一种可能是使用 Fabrice Bellard 的 TCC 库，它允许直接编译 C代码到内存缓冲区。

我正在编写一个 Lisp 到 Javascript 编译器，在这种情况下当然没有问题，因为“硬件”可以很好地处理它，你可以要求 Javascript 来评估例如字符串 "function(...){ ...}" 然后调用结果对象。使用 Javascript 还解决了在我看来 Lisp 内核最困难的问题之一，那就是词法闭包的正确实现。

事实上，在我的 javascript 编译器中，eval 或多或少是

(defun eval (x)
    (funcall (js-eval (js-compile x))))

js-compile 是主要编译器接口，并且给定的 lisp 形式将返回一个包含 javascript 代码的字符串，该字符串在评估时（使用 eval 我导出到 Lisp 级别的 js-eval）执行代码。有趣的是， eval 也从未被使用过（唯一非必要的例外是方便宏，我必须在宏扩展期间执行用户定义的代码）。

需要考虑的一件重要事情是，虽然 Common Lisp 在“读取时间”、“编译时间”和“运行时间”之间有一定的分离，但这种分离更具逻辑性而不是物理性，因为运行的代码始终是 Lisp。在 Lisp 中编译只是调用一个函数。即使“解析”阶段也只是执行一个常规的 Lisp 函数...它是 Lisp 一直向下:-)

链接到我的 Lisp → Js 玩具编译器

You found one of the main differences between Lisp and other languages.

In Lisp the execution of dynamically created code is essential, and for example necessary for macro expansion.

While writing a lisp to C compiler I discovered this now obvious thing myself and come to the conclusion that If you want to write a Lisp compiler there are only two solutions:

You write BOTH a compiler and an interpreter so that you can call the interpreter for macro expansion during compilation.
You must be able to dynamically compile code and call it (or using worse "tricks" like compiling a dynamically loadable module and then loading it).

If you are working on a compiler for C one possibility is to use Fabrice Bellard's TCC library that allows direct compilation of C code to a memory buffer.

I'm writing a Lisp to Javascript compiler and in this case there is of course no problem because "the hardware" can handle that nicely and you can ask Javascript to evaluate for example a string "function(...){...}" and then call the resulting object. Using Javascript also solves what is IMO one of the most difficult issues for a Lisp kernel that is proper implementation of lexical closures.

Indeed in my javascript compiler eval is just more or less

(defun eval (x)
    (funcall (js-eval (js-compile x))))

where js-compile is the main compiler interface and given a lisp form will give back a string containing javascript code that when evaluated (with the eval of javascript that I exported to the lisp level as js-eval) executes the code. Interestingly enough also eval is never used (with the only non essential exception of a convenience macro in which I've to execute user defined code during macro expansion).

One important thing to consider is that while Common Lisp has sort of separation between "read time", "compile time" and "run time" still this separation is more logical than physical as the running code is always Lisp. Compiling in Lisp is just calling a function. Even the "parsing" phase is just a regular lisp function executing... it's Lisp all the way down :-)