以解释性语言存储变量的数据结构

发布于 2025-01-07 10:37:11 字数 1857 浏览 3 评论 0原文

我正在设计自己的实验性脚本语言,以便将其嵌入到我更大的应用程序中。

几乎我想做的所有事情都可以顺利编程,但是将变量存储在内存中的“简单”行为似乎是这里最难的部分。我不知道如何存储它们以允许对它们进行所有类型检查、全局变量和特殊标志。首先看一个示例代码:

a = 1
b = 2

someFunction()
  print(a)   --> This should read the global variable and print `1`
  a = 3      --> Now `a` should become a local variable of this function
                 and the global `a` remain unchanged
  x = 4      --> `x` should always be local of this function
end

我将变量的“局部性”称为其级别,因此嵌套块中的变量具有更高的级别。在上面的代码中,ab是1级变量。 someFunction 的局部变量将具有级别 2。函数的第一行应读取全局变量 a(级别 1),但第二行应再次创建一个名为 a 的变量但级别 2 会从该点开始遮蔽全局 a。第三行应该创建级别为2的变量x。如何在内存中存储和跟踪所有这些?

到目前为止我尝试过的:

方法 1:将 variable=>value 的映射存储在级别数组中:

variables
{
    level=1 //global variables
    {
        a => 1,
        b => 2
    },
    level=2 //function variables
    {
        a => 3,
        x => 4
    }
}

但这会使变量查找非常慢,因为必须搜索给定变量的所有级别。

方法 2:将(变量,级别)对存储为映射的键:

variables
{
    (a, 1) => 1, //global
    (b, 1) => 2, //global
    (a, 2) => 3, //function
    (x, 2) => 3  //function
}

这与之前有相同的问题,因为我们必须尝试给定变量的所有可能级别的对(变量,级别)。

我应该使用什么方法来获得最佳的内存使用率和最快的访问时间?

附加说明:

我知道如何在其他“真实”语言上在堆栈和堆上管理变量,但我发现在解释语言上执行此操作很棘手。 “Lua 和 Python 不应该是这样做的,”我总是这样想。如果我错了请纠正我。我试图将变量存储在映射和内部 C++ 结构中。

最后,这就是我表示变量的方式。你认为它很大并且可以有更节省内存的表示形式吗? (我也尝试将“级别”作为成员放在这里,但它也有与另一个相同的问题。)

struct Member
{
    uchar type;  //0=num, 1=str, 2=function, 3=array, etc
    uchar flags; //0x80 = read-only, 0x40 = write-only, etc
    union {
        long double value_num;
        char* value_str;
        int value_func;
        //etc
    };
};

I am designing my own experimental scripting language for the purpose of embedding it in my bigger application.

Almost everything I wanted to do was programmed smoothly, but the "simple" act of storing variables in memory appeared to be the hardest part here. I don't know how to store them to allow all type checking, global variables and special flags on them. First look at a sample code:

a = 1
b = 2

someFunction()
  print(a)   --> This should read the global variable and print `1`
  a = 3      --> Now `a` should become a local variable of this function
                 and the global `a` remain unchanged
  x = 4      --> `x` should always be local of this function
end

I call the "locality" of variables their levels so variables in nested blocks have a higher level. In the above code, a and b are level 1 variables. Local variables of someFunction will have level 2. The first line of the function should read the global variable a (level 1) but the second line should create a variable again called a but with level 2 that shadows the global a from that point onwards. The third line should create the variable x with level 2. How to store and keep track of all these in memory?

What I tried so far:

Method 1: Storing maps of variable=>value in array of levels:

variables
{
    level=1 //global variables
    {
        a => 1,
        b => 2
    },
    level=2 //function variables
    {
        a => 3,
        x => 4
    }
}

But that will make variable look-up really slow since one has to search all the levels for a given variable.

Method 2: Storing the (variable, level) pairs as keys of a map:

variables
{
    (a, 1) => 1, //global
    (b, 1) => 2, //global
    (a, 2) => 3, //function
    (x, 2) => 3  //function
}

This has the same problem as before since we have to try the pair (variable, level) with all possible levels for a given variable.

What method should I use for optimal memory usage and fastest access time?

Additional notes:

I know about how variables are managed on stack and heap on other "real" languages, but I find it tricky to do this on an interpreted language. "This mustn't be how Lua and Python do that," I always think. Correct me if I'm wrong. I'm trying to store the variable in maps and internal C++ structures.

And finally, this is how I represent a variable. Do you think it's big and there can be more memory-efficient representations? (I've also tried to put the "Level" as a member here but it had the same problem as the other too.)

struct Member
{
    uchar type;  //0=num, 1=str, 2=function, 3=array, etc
    uchar flags; //0x80 = read-only, 0x40 = write-only, etc
    union {
        long double value_num;
        char* value_str;
        int value_func;
        //etc
    };
};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

铁轨上的流浪者 2025-01-14 10:37:11

与数组类似,一个简单的事情就是维护一堆地图。每个映射都包含该范围的绑定。要绑定变量,请将其添加到顶部地图;要查找变量,请从堆栈顶部开始,并在到达包含该变量绑定的映射时停止。搜索需要一点时间,但从顶部/末尾开始,您只需搜索直到找到它 - 在大多数情况下,此搜索不会很长。

您还可以通过将此逻辑封装在具有本地绑定和用于解析未知变量的继承环境的 Environment 类中来使堆栈隐式化。需要进入新的范围吗?以当前环境为基础创建一个新环境,使用它,然后在范围完成时丢弃它。根/全局环境只能有一个空继承环境。这就是我可能会做的。

An easy thing to do, similar to your array, is to maintain a stack of maps. Each map contains the bindings for that scope. To bind a variable, add it to the top map; to look up a variable, start at the top of the stack and stop when you reach a map that contains a binding for that variable. Search takes a little bit, but starting from the top/end you only have to search until you find it — in most cases, this search will not be very long.

You can also make the stack implicit by encapsulating this logic in an Environment class that has local bindings and an inherited environment used for resolving unknown variables. Need to go into a new scope? Create a new environment with the current environment as its base, use it, then discard it when the scope is finished. The root/global environment can just have a null inherited environment. This is what I would probably do.

痴情 2025-01-14 10:37:11

值得注意的是,如果在函数内部,您无法访问调用者函数中的任何变量,则会减少您需要查看的级别数。例如:

variable a;

function one() {
    variable b;
    // in this function, we can see the global a, local b
    two();
}

function two() {
    // in this function, we can see the global a, local c
    // we cannot see the local b of our caller
    variable c;
    while (true) {
        variable d;
        // here we can see local d, local c, global a
    }
}

这个想法是函数边界限制变量的可见性,全局作用域是“特殊的”。

话虽如此,您可以考虑删除全局变量的特殊性,但允许代码指定它们想要访问非局部变量

variable a;

function one() {
    global a; // or upvar #0 a;
    variable b;
    // in this function, we can see the global a, local b
    two();
}

function two() {
    // in this function, we can see the local c
    // and the local b of our caller
    // (since we specifically say we want access to "b" one level up)
    upvar 1 b;
    variable c;
}

一开始看起来很复杂,但是一旦习惯了它就很容易理解(upvar是Tcl 编程语言的构造)。它允许您访问调用者作用域中的变量,但它通过要求您准确指定该变量的来源来避免一些昂贵的查找(1 是调用堆栈的上一级,2 是上两级, #0 是“特殊的”,表示“最上面的调用堆栈,全局的”

Its worth noting that if, inside a function, you don't have access to any variables from the caller function, it lowers the number of levels you need to look at. For example:

variable a;

function one() {
    variable b;
    // in this function, we can see the global a, local b
    two();
}

function two() {
    // in this function, we can see the global a, local c
    // we cannot see the local b of our caller
    variable c;
    while (true) {
        variable d;
        // here we can see local d, local c, global a
    }
}

The idea being that function boundaries limit the visibility of variables, with the global scope being "special".

That being said, you can consider removing the specialness of global variables, but allowing the code to specify that they want access to non-local variables

variable a;

function one() {
    global a; // or upvar #0 a;
    variable b;
    // in this function, we can see the global a, local b
    two();
}

function two() {
    // in this function, we can see the local c
    // and the local b of our caller
    // (since we specifically say we want access to "b" one level up)
    upvar 1 b;
    variable c;
}

It looks complicated at first, but it's really easy to understand once you get used to it (upvar is a construct from the Tcl programming language). What it allows you is access to variables in your caller's scope, but it avoids some of the costly lookup involved by requiring that you specify exactly where that variable comes from (with 1 being one level up the call stack, 2 being two levels up, and #0 being "special" in saying "the uppermost call stack, the global)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文