最适合动态语言字段访问的数据结构

发布于 2024-10-13 06:13:01 字数 825 浏览 2 评论 0原文

我正在实现一种将编译为 C# 的动态语言，并且它正在实现自己的反射 API（.NET 太慢，并且 DLR 仅限于更新且资源丰富的实现）。

为此，我实现了一个简单的 .GetField(string f) 和 .SetField(string f, object val) 接口。直到最近，实现只是切换所有可能的字段字符串值并执行相应的操作。此外，这种动态语言还可以定义匿名对象。对于那些匿名对象，首先，我实现了一个简单的哈希算法。

到目前为止，我正在寻找优化该语言的动态部分的方法，并且我发现匿名对象的哈希算法是多余的。这是因为物体通常很小。我想说这些对象通常包含 2 或 3 个字段。它们很少包含超过 15 个字段。与测试它们之间的相等性相比，实际散列字符串并执行查找需要更多时间。（这没有经过测试，只是理论上的）。

我做的第一件事是——在编译时——为每个匿名对象声明创建一个红黑树，并将其放置到一个数组上，以便该对象可以以非常优化的方式查找它。

不过，如果这是最好的方法的话，我仍然存在分歧。我可以寻求完美的哈希函数。更激进的是，我正在考虑放弃对字符串的需求，而实际使用 2 个 long 的结构。

这两个长整型将被编码为每个支持 10 个字符 (A-za-z0-9_)，这主要是对字段大小的良好预测。对于大于此的字段，还将提供接收字符串的特殊函数（较慢）。

结果将是字符串将被内联（而不是引用），并且它们的比较将与长比较一样便宜。

不管怎样，找到关于这种优化的好信息有点困难，因为这通常是在虚拟机级别上考虑的，而不是静态语言编译实现。

有人对处理动态调用的最佳数据结构有任何想法或技巧吗？

编辑： 现在，我真的将字符串作为长表示和线性二叉树查找。

原文

I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).

For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.

By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).

The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.

I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.

Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.

The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.

Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.

Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?

Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你与清晨阳光 2024-10-20 06:13:01

我不知道这是否有帮助，但以防万一我会放弃它；

如果这是编译为 C#，您知道编译时的完整字段列表吗？因此，作为一个想法，如果您的代码

// dynamic
myObject.foo = "some value";
myObject.bar = 32;

在解析期间读取 then，您的符号表可以为每个字段名称构建一个 int ；

// parsing code
symbols[0] == "foo"
symbols[1] == "bar"

然后使用数组或列表生成代码；

// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar

并将反射构建为单独的数组；

runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0;  // Dictionary<string, int>

正如我所说，扔掉是希望它有用。不知道会不会！

I don't know if this is helpful, but I'll chuck it out in case;

If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads

// dynamic
myObject.foo = "some value";
myObject.bar = 32;

then during the parse, your symbol table can build an int for each field name;

// parsing code
symbols[0] == "foo"
symbols[1] == "bar"

then generate code using arrays or lists;

// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar

and build up reflection as a separate array;

runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0;  // Dictionary<string, int>

As I say, thrown out in the hope it'll be useful. No idea if it will!

回复收藏 0 原文

玉环 2024-10-20 06:13:01

由于您可能会重复使用相同的字段和方法名称，因此字符串驻留之类的东西会起作用很好地快速为您的哈希表生成键。它还将使字符串相等比较的时间恒定。

回复收藏 0 原文

深府石板幽径 2024-10-20 06:13:01

对于如此小的数据集（预计上限为 15），我认为几乎任何散列都会比树甚至列表查找更昂贵，但这实际上取决于您的散列算法。

如果您想使用字典/哈希，那么您需要确保用作键的对象快速返回哈希代码（可能是构建一次的单个常量哈希代码）。如果您可以防止对象内部发生冲突（听起来相当可行），那么您将获得哈希表的速度和可扩展性（适用于任何实际的对象/类大小）。

我想到的是 Ruby 的符号和消息传递。我相信 Ruby 的符号只是充当内存引用的常量。所以比较是恒定的，它们非常精简，并且您可以使用变量之类的符号（我对此有点模糊，并且这台机器上没有 Ruby 解释器）。 Ruby 的方法“调用”实际上变成了消息传递。类似于： obj.func(arg) 变成 obj.send(:func, arg) （“:func”是符号）。我想这个符号使得在对象内查找消息处理程序（我将这样称呼它）非常有效，因为它的哈希码很可能不需要像大多数对象一样进行计算。

也许.NET 中也可以完成类似的事情。

回复收藏 0 原文

~没有更多了~