我想存储一个 lua 表,其中键是其他 lua 表。我知道这是可能的,但我希望能够使用这些表的副本在表中进行查找。具体来说,我希望能够执行:
t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }
然后我希望能够查找:
t[key2]
并得到 4。
我知道我可以将 key
转换为字符串并将其放入表 t
。我还考虑过编写自定义哈希函数或通过嵌套表来完成此操作。有没有最好的方法让我获得此类功能?我还有什么其他选择?
I want to store a lua table where the keys are other lua tables. I know that this is possible BUT I want to be able to do look ups in the table using copies of those tables. Specifically, I want to be able to do:
t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }
and then I want to be able to look up:
t[key2]
and get 4.
I know that I can turn key
into a string and put it into table t
. I've also thought about writing a custom hash function or doing this by nesting tables. Is there a best way for me to get this type of functionality? What other options do I have?
发布评论
评论(7)
在Lua中,单独创建的两个表被认为是“不同的”。但是如果你创建了一个表,你就可以将它分配给任何你想要的变量,当你比较它们时,Lua会告诉你它们是相等的。换句话说:
所以,这就是做你想做的事情的简单、干净的方法。将
key
存储在某处,以便您可以使用它取回4
。这也非常快。如果您真的不想这样做......那么,有一个办法。但这有点低效且丑陋。
第一部分是创建一个比较两个单独表的函数。如果两个表“等价”,则应返回 true,否则返回 false。我们称之为等价。它应该像这样工作:
该函数必须是递归的,才能处理包含表本身的表。如果其中一个表“包含”另一个表,但具有更多元素,也不能被欺骗。我提出了这个实现;可能还有更好的。
我不打算在这里解释这个函数。我希望它的作用足够清楚。
难题的另一部分在于使
t
在比较键时使用equivalent
函数。这可以通过仔细的元表操作和额外的“存储”表来完成。我们基本上将
t
转变为冒名顶替者。当我们的代码告诉它在键下存储值时,它不会将值保存在自身中;而是将值存储在键下。相反,它将它提供给额外的表(我们称之为store
)。当代码要求t
提供值时,它会在store
中搜索该值,但使用equivalent
函数来获取它。这是代码:
使用示例:
In Lua, two tables created separately are considered "different". But if you create a table once, you can assign it to any variables you want, and when you compare them, Lua will tell you that they are equal. In other words:
So, that's the simple, clean way of doing what you want. Store
key
somewhere, so you can retrieve the4
back by using it. This is also very fast.If you really don't want to do that ... well, there is a way. But it is kindof inefficient and ugly.
The first part is making a function that compares two separate tables. It should return true if two tables are "equivalent", and false if they are not. Let's call it equivalent. It should work like this:
The function must be recursive, to handle tables that contain tables themselves. It also must not be fooled if one of the tables "contains" the other, but has more elements. I came out with this implementation; probably there are better ones out there.
I'm not going to explain that function here. I hope it is clear enough what it does.
The other part of the puzzle consist on making
t
use theequivalent
function when comparing keys. This can be done with careful metatable manipulation, and an extra "storage" table.We basically transform
t
into an impostor. When our code tells it to store a value under a key, it doesn't save it in itself; instead it gives it to the extra table (we'll call thatstore
). When the code askst
for a value, it searches for it instore
, but using theequivalent
function to get it.This is the code:
Usage example:
kikito 的答案很好,但有一些缺陷:
store
将包含两个表(泄漏内存哈希表的生命周期)(另请注意,如果任何表具有循环引用,kikito 的“等效”函数将导致无限循环。)
如果您从不需要更改/删除中的任何信息表,那么 kikito 的答案就足够了。否则,必须更改元表,以便 __newindex 确保该表尚不存在:
正如您所建议的,一个完全不同的选项是编写自定义哈希函数。这是一个可以利用它的哈希表:
使用示例:
当然,您会想要获得更好的哈希/等于函数。
只要密钥的哈希值很少发生冲突,此类的性能就应该是 O(1)。
(注:我本来想把这个答案的上半部分作为对 kikito 的评论,但目前我还没有这样做的声誉。)
kikito's answer is good, but has some flaws:
t[{a=1}] = true
two times,store
will contain two tables (leaking memory for the lifetime of the hash table)(Also note that kikito's "equivalent" function will cause an infinite loop if any table has a circular reference.)
If you never need to change/remove any information in the table, then kikito's answer will suffice as it stands. Otherwise, the metatable must be changed so that the __newindex makes sure that the table doesn't already exist:
As you've suggested, a completely different option is to write a custom hashing function. Here's a HashTable that can make use of that:
Usage example:
Naturally, you'll want to get better Hash/Equals functions.
So long as the hashes of your keys rarely collide, this performance of this class should be O(1).
(Note: I'd have put the top half of this answer as a comment to kikito, but I don't have the reputation at this time to do so.)
这在 Lua 中是不可能的。如果您使用表作为键,则键是表的特定“实例”。即使您制作具有相同内容的不同表,实例也是不同的,因此它是不同的键。
如果您想做这样的事情,您可以创建一种哈希函数,它遍历表作为键(如果需要甚至可以递归)并构造表内容的字符串表示形式。它不需要是人类可读的,只要它对于不同的内容是不同的并且对于具有相同内容的表格是相等的即可。除了使用
pairs()
遍历表之外,您还需要将键插入表中并使用table.sort()
对它们进行排序,因为pairs()
以任意顺序返回它们,并且您希望“相等”表具有相同的字符串。一旦构建了这样的字符串,您就可以将其用作键:
在我看来,这对于简单的索引任务来说太复杂了,您可能需要重新考虑使用表副本进行索引的愿望。为什么你想要这样的功能?
更新
如果您只需要使用短语,我认为连接它们比创建此类通用哈希函数更容易。如果您需要它来表示短语序列,则实际上不需要遍历表并对键进行排序,只需收集每个短语的主要信息即可。您仍然需要使用辅助函数,它可以为您创建合适的密钥:
This is not possible in Lua. If you use tables as keys, the key is that specific "instance" of the table. Even if you make a different table with the same contents, the instance is different, therefore it is a different key.
If you want to do something like this, you can create a kind of hash function, which traverses the table to serve as a key (maybe even recursively if needed) and construct a string representation of the table content. It does not need to be human-readable, as long as it is different for different content and equal for tables with the same content. Apart from using
pairs()
to traverse the table, you would also need to insert the keys into a table and sort them usingtable.sort()
, becausepairs()
returns them in an arbitrary order, and you want the same string for "equal" tables.Once you have constructed such string, you can use it as a key:
In my opinion, this all is too complicated for the simple task of indexing, and you may want to re-think your wish for indexing using copies of tables. Why would you want such functionality?
Update
If you only need to work with phrases, I think that concatenating them is easier than creating such generic hash function. If you need it for sequences of phrases, you won't actually need to iterate through the tables and sort the keys, just collect the main information from each phrase. You would still need to use a helper function, which can create a suitable key for you:
我对语言处理以及您想要通过程序达到的目标了解不多,但是像这样收集标记怎么样:使用嵌套表结构,例如索引表仅存储由第一个短语标记索引的表,然后每个子表包含由第二个短语标记索引的值...等等...直到到达短语最后一个标记,将索引与该短语的出现相对应的数字值。
也许举个例子会更清楚,如果你有以下两个短语:
您的索引将具有以下结构:
通过这种方式,您可以通过单个遍历步骤来计算频率,并在建立索引的同时计算出现次数,但正如我之前所说,这取决于您的目标是什么,这意味着重新分割您的短语,以便通过索引查找出现的情况。
I don't know a lot about Language Processing, and about the goal you want to reach with your program, but what about collecting token like this : use a nested table structure such has the index table store only tables indexed by first phrase token, then each subtables contains value indexed by second phrase token ... etc ... until you reach a phrase final token, will index an number value corresponding to he occurence of the phrase.
Maybe it will be more clear with a exemple, if you have the two following phrase :
Your index would have the following structure :
In that way you can count frenquencies with a single traversal step, and count occurences at the same time you indexing, but as i said before, it depends of what is your goal, and it will imply to re - split you phrase so as to find occurences through your index.
我不确定你能做到这一点。您可以使用元表定义表的相等性,但无法定义哈希函数,而且我怀疑单独定义相等性是否能满足您的需要。显然,您可以定义相等性,然后使用
pairs()
迭代表并自己比较键,但这会将应该O(1)
的查找变成O(n)
。I'm not sure you can do this. You can define equality for tables using the metatable, but there's no way to define a hash function, and I doubt defining equality alone will do what you need. You could obviously define equality, then iterate over the table using
pairs()
and comparing the keys yourself, but that will turn what should beO(1)
lookup intoO(n)
.kikito 的答案有一个解决方案的开始,但是,chess123mate的回答指出,它是只写的(以及其他缺陷)。此解决方案并不能解决所有问题,但它是一个开始。 (它也非常非常慢。)
Lua Playground 链接(或 GitHub Gist 链接 用于复制和粘贴到游乐场,如果你的浏览器讨厌我最后一个)。
kikito's answer has the beginnings of a solution but, as chess123mate's answer notes, it's write-only (among other flaws). This solution doesn't fix all of them, but it's a start. (It's also very, very slow.)
Lua Playground link (or GitHub Gist link for copying and pasting into the playground, if your browser hates me for that last one).
我认为做这样的事情最简单的方法是拥有一个返回 tableKey 工厂="nofollow noreferrer">输入表的只读表。
您可能希望分配一个
__key
(或类似的)值用作键。类似于:这将确保用作键的表是只读的,并且如果相等,它的 id 将始终匹配。
I think the simplest way to do something like this would be to have a
tableKey
factory which returns readonly tables for the inputted table.You would want to assign a
__key
(or similar) value to use as the keys. Something like:This will ensure that the table being used as the key is readonly and that it's id will always match if it is equal.