用脚本语言实现 lambda

发布于 2024-12-22 15:37:44 字数 392 浏览 0 评论 0原文

我从有关匿名函数的维基百科文章中获取了这个修改后的代码示例。

comp(threshold)
{
    return lambda(x)
    {
        return x < threshold;
    };
}

main()
{
    a = comp(10);

    lib::print( a(5) );
}

将匿名函数添加到我的脚本语言中应该不会太难。它应该只是以正常方式添加函数代码的情况，但访问该函数的唯一方法是通过为其分配的变量。

上面的闭包示例的问题在于，匿名函数的函数体引用（一般意义上）在调用闭包时无效（或将无效）的内存位置。

我脑子里已经有两个可能的解决方案；在尝试将此功能添加到我的语言中之前，我只想先获得一些建议。

原文

I have this modified code example I took from the wikipedia article on anonymous functions.

comp(threshold)
{
    return lambda(x)
    {
        return x < threshold;
    };
}

main()
{
    a = comp(10);

    lib::print( a(5) );
}

Anonymous functions shouldn't be too hard to add to my scripting language. It should just be a case of adding the function code in the normal fashion, with the exception that the only way to access that function is through the variable to which it is assigned.

The problem with the closure example above is that the function body for the anonymous function references (in a general sense) a memory location that is invalid (or would be) at the time the closure is called.

I have two potential solutions to the problem in my mind already; I just want to get some recommendations first, before I attempt to add this feature to my language.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡墨 2024-12-29 15:37:44

我不知道最合理的方式，但你可以在 Lua 5.0的实现。另请参阅幻灯片。

Lua 实现闭包的要点是有效处理上值或外部局部变量。这使得 Lua 能够支持完整的词法作用域。有关 Lua 设计中这种支持如何演变的说明，请参阅 HOPL III 论文，The Evolution of卢阿。

回复收藏 0 原文

死开点丶别碍眼 2024-12-29 15:37:44

由于你的问题不是很具体，只能用通用算法来回答。我也没有声称这是“最明智”的方式，只是一种有效的方式。

首先，我们需要定义问题空间。

您的脚本语言具有局部变量（使用 Lua 术语）：非全局变量。这些变量理论上可以被 lambda 捕获。

现在，我们假设您的脚本语言无法动态选择局部变量。这意味着，只需检查语法树，就可以看到以下内容：

函数捕获了哪些局部变量。
哪些局部变量不被函数捕获。
哪些函数捕获其范围之外的局部变量。
哪些函数不捕获其作用域之外的局部变量。

鉴于此信息，局部变量现在分为两组：纯局部变量和捕获的局部变量。我将这些称为“纯粹的当地人”和“捕获的当地人”。

由于缺乏更好的术语，纯粹的本地人是寄存器。当您编译为字节代码时，纯局部变量是最容易处理的。它们是特定的堆栈索引，或者它们是特定的寄存器名称。无论您如何进行堆栈管理，纯本地变量都会在特定范围内分配固定位置。如果您使用 JIT 的力量，那么这些将成为寄存器，或者最接近寄存器的东西。

关于捕获的局部变量，您需要了解的第一件事是：它们必须由内存管理器管理。它们独立于当前调用堆栈和作用域而存在，因此，它们需要是自由的-由捕获它们的函数引用的站立对象。这允许多个函数捕获相同的本地数据，从而引用彼此的私有数据。

因此，当您输入包含捕获的 lambda 的作用域时，您将分配一块内存，其中包含属于该特定作用域的所有捕获的局部变量。例如：

comp(threshold)
{
    local data;
    return lambda(x)
    {
        return x < (threshold + data);
    };
}

comp函数的根作用域有两个局部变量。他们两人都被俘虏了。因此，捕获的本地人的数量为2，纯本地人的数量为0。

因此，您的编译器（对于字节代码）将为纯局部变量分配 0 个寄存器/堆栈变量，并且它将分配一个包含两个变量的独立对象。假设您正在使用垃圾收集，您将需要一些东西来引用它才能使其继续存在。这很简单：您在寄存器/堆栈位置引用它，脚本无法直接访问该位置。所以实际上，您确实分配了一个寄存器/堆栈变量，但脚本无法直接触及它。

现在，让我们看看 lambda 做了什么。它创建了一个函数。同样，我们知道这个函数捕获了一些超出其范围的变量。我们知道它捕获了哪些变量。我们看到它捕获了两个变量，但我们也看到这两个变量来自同一个独立的内存块。

因此 lambda 所做的就是创建一个函数对象，该对象具有对某些字节码的引用以及对其关联的变量的引用。字节码将使用该引用来获取其捕获的变量。您的编译器知道哪些变量是函数的纯本地变量（例如参数 x ），哪些是外部捕获的本地变量（例如阈值）。所以它可以弄清楚如何访问每个变量。

现在，当 lambda 完成时，它会返回一个函数对象。此时，捕获的变量由两个事物引用：lambda 函数和堆栈：函数的当前作用域。然而，当 return 完成时，当前作用域将被销毁，并且之前引用的所有内容都不再被引用。因此，当它返回函数对象时，只有 lambda 函数拥有对捕获变量的引用。

但这一切都相当复杂。一个更简单的实现是有效地捕获所有局部变量；所有局部变量都是捕获的局部变量。如果你这样做，那么你的编译器会变得更简单（而且可能更快）。当进入一个新的作用域时，该作用域的所有局部变量都被分配在一个内存块中。创建函数时，它会引用它使用的所有外部作用域（如果有）。当退出作用域时，它会删除对其分配的局部变量的引用。如果没有其他人引用它，则可以释放内存。

这非常简单明了。

Since your question wasn't very specific, it can only be answered with a general algorithm. I also make no claims that this is the "most sensible" way, simply one that works.

First, we need to define the problem space.

Your scripting language has local variables (using the Lua term): variables which are not global. These are variables which can theoretically be captured by a lambda.

Now, let's assume your scripting language does not have a way to dynamically select local variables. This means that, simply by inspecting the syntax tree, one can see the following:

Which local variables are captured by a function.
Which local variables are not captured by a function.
Which functions capture local variables outside of their scope.
Which functions do not capture local variables outside of their scope.

Given this information, local variables now are split into two groups: pure local variables, and captured local variables. I will call these "pure locals" and "captured locals".

Pure locals are, for want of a better term, registers. When you compile to your byte code, pure locals are the simplest to handle. They're specific stack indices, or they're specific register names. However you're doing your stack management, pure locals are assigned fixed locations within a particular scope. If you're wielding the power of JIT, then these are going to become registers, or the closest thing to that possible.

The very first thing you need to understand about captured locals is this: they must be managed by your memory manager. They exist independently of the current call stack and scope, and therefore, they need to be free-standing objects which are referenced by functions that capture them. This allows multiple functions to capture the same local and therefore reference each others private data.

Therefore, when you enter a scope that contains captured lambdas, you will allocate a piece of memory that contains all of the captured locals that are part of that particular scope. For example:

comp(threshold)
{
    local data;
    return lambda(x)
    {
        return x < (threshold + data);
    };
}

The root scope of the comp function has two local variables. Both of them are captured. Therefore, the number of captured locals is 2 and the number of pure locals is zero.

Your compiler (to byte code) will therefore allocate 0 registers/stack variables for pure locals, and it will allocate a free-standing object which contains two variables. Assuming you're using garbage collection, you will need something to reference it in order for it to continue to live. That's easy: you reference it in a register/stack location, one that is not directly accessible by the script. So really, you do allocate a register/stack variable, but the script can't directly touch it.

Now, let's look at what lambda does. It creates a function. Again, we know that this function captures some variables outside of its scope. And we know which variables it captures. We see that it captures two variables, but we also see that these two variables come from the same free-standing block of memory.

So what lambda does is create a function object that has a reference to some bytecode and a reference to the variables it is associated with. The bytecode will use that reference to get its captured variables. Your compiler knows which variables are pure local to the function (like the argument x) and which are externally captured locals (like threshold). So it can figure out how to access each variable.

Now, when lambda completes, it returns a function object. At this point, the captured variables are referenced by two things: the lambda function and the stack: the current scope of the function. When return finishes however, the current scope is destroyed and all the things that were previously referenced are no longer referenced. So when it returns the function object, only the lambda function has the reference to the captured variables.

This is all rather complicated though. A much simpler implementation would be to just make all local variables effectively captured; all local variables are captured locals. If you do this, then your compiler can be a lot simpler (and probably quicker). When a new scope is entered, all of the locals for that scope are allocated in a block of memory. When a function is created, it references all of the external scopes it uses (if any). And when a scope is exited, it removes a reference to the locals it allocated. If nobody else is referencing it, then the memory can be freed.

It's very simple and straightforward.

回复收藏 0 原文

风筝在阴天搁浅。 2024-12-29 15:37:44

如果我将其翻译成 C++（没有 lambda），它会看起来像这样：

struct comp_lamda_1 {
    int threshold;
    comp_lamda_1(int t) :threshold(t) {}
    bool operator()(int x) {
        return x < threshold;
    };
};

comp_lambda_1 comp(int threshold)
{
    return comp_lamda_1(threshold);
}

int main()
{
    auto a = comp(10);
    std::cout << a(5);
}

这表明解释器不应该将匿名函数视为独立函数，而应将其视为函数对象，它有成员来捕获所需的变量。

（要明确的是，关键是 comp_lamda_1 是一个函数对象，我知道您并没有要求对上述代码进行 C++ 翻译）

If I were to translate this into C++ (without lambdas) it would look like this:

struct comp_lamda_1 {
    int threshold;
    comp_lamda_1(int t) :threshold(t) {}
    bool operator()(int x) {
        return x < threshold;
    };
};

comp_lambda_1 comp(int threshold)
{
    return comp_lamda_1(threshold);
}

int main()
{
    auto a = comp(10);
    std::cout << a(5);
}

This shows that the interpreter shouldn't treat an anonymous function as a freestanding function, but instead as a function-object, that has members to capture the needed variables.

(To be clear, the point is that comp_lamda_1 is a function-object, and I understand you weren't asking for a C++ translation of the above code)

回复收藏 0 原文

浮生面具三千个 2024-12-29 15:37:44

我一直在阅读有关 Lua 中使用的 upvalues 的内容。我将尝试实现一个类似的系统来处理闭包和完整的词法范围。棘手的部分是让编译器根据需要将关闭命令放置在正确的位置。

function()
{
    a = 6, b;

    {
        local c = 5;

        b = lambda() { return a*c; };

        // close c in closure;
    }

    b();

    // close a in closure
}

I've been reading about the upvalues used in Lua. I'm going to try to implement a similar system for dealing with closures and full lexical scoping. The tricky part will be getting the compiler to place the close commands in the correct places, as needed.

function()
{
    a = 6, b;

    {
        local c = 5;

        b = lambda() { return a*c; };

        // close c in closure;
    }

    b();

    // close a in closure
}

回复收藏 0 原文

~没有更多了~