当前位置：文江博客话题详情

Lua hashtable lua-table

Lua：如何在键为表（或对象）的表中查找

发布于 2025-01-03 18:41:33 字数 343 浏览 1 评论 0 原文

我想存储一个 lua 表，其中键是其他 lua 表。我知道这是可能的，但我希望能够使用这些表的副本在表中进行查找。具体来说，我希望能够执行：

t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }

然后我希望能够查找：

t[key2]

并得到 4。

我知道我可以将 key 转换为字符串并将其放入表 t。我还考虑过编写自定义哈希函数或通过嵌套表来完成此操作。有没有最好的方法让我获得此类功能？我还有什么其他选择？

原文

I want to store a lua table where the keys are other lua tables. I know that this is possible BUT I want to be able to do look ups in the table using copies of those tables. Specifically, I want to be able to do:

t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }

and then I want to be able to look up:

t[key2]

and get 4.

I know that I can turn key into a string and put it into table t. I've also thought about writing a custom hash function or doing this by nesting tables. Is there a best way for me to get this type of functionality? What other options do I have?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一个人的旅程 2025-01-10 18:41:33

在Lua中，单独创建的两个表被认为是“不同的”。但是如果你创建了一个表，你就可以将它分配给任何你想要的变量，当你比较它们时，Lua会告诉你它们是相等的。换句话说：

t = {}
key = { a = "a" }
t[key] = 4
key2 = key
...
t[key2] -- returns 4

所以，这就是做你想做的事情的简单、干净的方法。将 key 存储在某处，以便您可以使用它取回 4。这也非常快。

如果您真的不想这样做......那么，有一个办法。但这有点低效且丑陋。

第一部分是创建一个比较两个单独表的函数。如果两个表“等价”，则应返回 true，否则返回 false。我们称之为等价。它应该像这样工作：

equivalent({a=1},{a=1})          -- true
equivalent({a=1,b=2}, {a=1})     -- false
equivalent({a={b=1}}, {a={b=2}}) -- false

该函数必须是递归的，才能处理包含表本身的表。如果其中一个表“包含”另一个表，但具有更多元素，也不能被欺骗。我提出了这个实现；可能还有更好的。

local function equivalent(a,b)
  if type(a) ~= 'table' then return a == b end

  local counta, countb = 0, 0

  for k,va in pairs(a) do
    if not equivalent(va, b[k]) then return false end
    counta = counta + 1
  end

  for _,_ in pairs(b) do countb = countb + 1 end

  return counta == countb
end

我不打算在这里解释这个函数。我希望它的作用足够清楚。

难题的另一部分在于使t 在比较键时使用equivalent 函数。这可以通过仔细的元表操作和额外的“存储”表来完成。

我们基本上将 t 转变为冒名顶替者。当我们的代码告诉它在键下存储值时，它不会将值保存在自身中；而是将值存储在键下。相反，它将它提供给额外的表（我们称之为store）。当代码要求 t 提供值时，它会在 store 中搜索该值，但使用 equivalent 函数来获取它。

这是代码：

local function equivalent(a,b)
... -- same code as before
end

local store = {} -- this is the table that stores the values

t = setmetatable({}, {
  __newindex = store,
  __index = function(tbl, key)
    for k,v in pairs(store) do
      if equivalent(k,key) then return v end
    end
  end
})

使用示例：

t[{a = 1}] = 4

print(t[{a = 1}]) -- 4
print(t[{a = 1, b = 2}]) -- nil

In Lua, two tables created separately are considered "different". But if you create a table once, you can assign it to any variables you want, and when you compare them, Lua will tell you that they are equal. In other words:

t = {}
key = { a = "a" }
t[key] = 4
key2 = key
...
t[key2] -- returns 4

So, that's the simple, clean way of doing what you want. Store key somewhere, so you can retrieve the 4 back by using it. This is also very fast.

If you really don't want to do that ... well, there is a way. But it is kindof inefficient and ugly.

The first part is making a function that compares two separate tables. It should return true if two tables are "equivalent", and false if they are not. Let's call it equivalent. It should work like this:

equivalent({a=1},{a=1})          -- true
equivalent({a=1,b=2}, {a=1})     -- false
equivalent({a={b=1}}, {a={b=2}}) -- false

The function must be recursive, to handle tables that contain tables themselves. It also must not be fooled if one of the tables "contains" the other, but has more elements. I came out with this implementation; probably there are better ones out there.

local function equivalent(a,b)
  if type(a) ~= 'table' then return a == b end

  local counta, countb = 0, 0

  for k,va in pairs(a) do
    if not equivalent(va, b[k]) then return false end
    counta = counta + 1
  end

  for _,_ in pairs(b) do countb = countb + 1 end

  return counta == countb
end

I'm not going to explain that function here. I hope it is clear enough what it does.

The other part of the puzzle consist on making t use the equivalent function when comparing keys. This can be done with careful metatable manipulation, and an extra "storage" table.

We basically transform t into an impostor. When our code tells it to store a value under a key, it doesn't save it in itself; instead it gives it to the extra table (we'll call that store). When the code asks t for a value, it searches for it in store, but using the equivalent function to get it.

This is the code:

local function equivalent(a,b)
... -- same code as before
end

local store = {} -- this is the table that stores the values

t = setmetatable({}, {
  __newindex = store,
  __index = function(tbl, key)
    for k,v in pairs(store) do
      if equivalent(k,key) then return v end
    end
  end
})

Usage example:

t[{a = 1}] = 4

print(t[{a = 1}]) -- 4
print(t[{a = 1, b = 2}]) -- nil

回复收藏 0 原文

傲性难收 2025-01-10 18:41:33

kikito 的答案很好，但有一些缺陷：

如果执行 t[{a=1}] = true 两次，store 将包含两个表（泄漏内存哈希表的生命周期）
一旦存储了值就无法修改该值，也无法将其删除。尝试更改它可能会导致检索返回您过去分配给该键的任何值。
访问性能为 O(n)（n 是存储条目的数量，假设 lua 从表中检索值的时间为 O(1)）；结合第一点，这个哈希表的性能会随着使用而降低

（另请注意，如果任何表具有循环引用，kikito 的“等效”函数将导致无限循环。）

如果您从不需要更改/删除中的任何信息表，那么 kikito 的答案就足够了。否则，必须更改元表，以便 __newindex 确保该表尚不存在：

t = setmetatable({}, {
    __newindex = function(tbl, key, value)
        for k,v in pairs(store) do
            if equivalent(k,key) then
                tbl[k] = value
                return
            end
        end
        store[key] = value
    end,
    __index = function(tbl, key)
        for k,v in pairs(store) do
            if equivalent(k, key) then return v end
        end
    end
})

正如您所建议的，一个完全不同的选项是编写自定义哈希函数。这是一个可以利用它的哈希表：

local function HashTable(Hash, Equals)
    --Hash is an optional function that takes in any key and returns a key that lua can use (string or number). If you return false/nil, it will be assumed that you don't know how to hash that value.
    --    If Hash is not provided, table-keys should have a GetHash function or a .Hash field
    --Equals is an optional function that takes two keys and specify whether they are equal or not. This will be used when the same hash is returned from two keys.
    --    If Equals is not provided, items should have a Equals function; items are in this case assumed to not be equal if they are different types.
    local items = {} --Dict<hash, Dict<key, value>>
    local function GetHash(item)
        return Hash and Hash(item) or type(item) == "table" and (item.GetHash and item:GetHash() or item.Hash) or item
    end
    local function GetEquals(item1, item2)
        if Equals then return Equals(item1, item2) end
        local t1, t2 = type(item1), type(item2)
        if t1 ~= t2 then return false end
        if t1 == "table" and item1.Equals then
            return item1:Equals(item2)
        elseif t2 == "table" and item2.Equals then
            return item2:Equals(item1)
        end
        return false
    end
    return setmetatable({}, {
        __newindex = function(_, key, value)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then
                if value ~= nil then --Only generate a table if it will be non-empty after assignment
                    items[hash] = {[key] = value}
                end
                return
            end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then --Found the entry; update it
                    dict[k] = value
                    if value == nil then --check to see if dict is empty
                        if next(dict) == nil then
                            items[hash] = nil
                        end
                    end
                    return
                end
            end
            --This is a unique entry
            dict[key] = value
        end,
        __index = function(_, key)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then return nil end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then
                    return v
                end
            end
        end
    })
end

使用示例：

local h = HashTable(
    function(t) return t.a or 0 end, --Hash
    function(t1, t2) return t1.a == t2.a end) --Equals
h[{a=1}] = 1
print(h[{a=1}]) -- 1
h[{a=1}] = 2
print(h[{a=1}]) -- 2
print(h[{a=1,b=2}]) -- 2 because Hash/Equals only look at 'a'

当然，您会想要获得更好的哈希/等于函数。

只要密钥的哈希值很少发生冲突，此类的性能就应该是 O(1)。

（注：我本来想把这个答案的上半部分作为对 kikito 的评论，但目前我还没有这样做的声誉。）

kikito's answer is good, but has some flaws:

If you perform t[{a=1}] = true two times, store will contain two tables (leaking memory for the lifetime of the hash table)
Modifying the value once you've already stored it doesn't work, nor can you remove it. Attempting to change it will result in the retrieval potentailly returning any value you've assigned to that key in the past.
Access performance is O(n) (n being the number of stored entries and assuming that lua's value retrieval from a table is O(1)); combined with the first point, performance of this hash table will degrade with use

(Also note that kikito's "equivalent" function will cause an infinite loop if any table has a circular reference.)

If you never need to change/remove any information in the table, then kikito's answer will suffice as it stands. Otherwise, the metatable must be changed so that the __newindex makes sure that the table doesn't already exist:

t = setmetatable({}, {
    __newindex = function(tbl, key, value)
        for k,v in pairs(store) do
            if equivalent(k,key) then
                tbl[k] = value
                return
            end
        end
        store[key] = value
    end,
    __index = function(tbl, key)
        for k,v in pairs(store) do
            if equivalent(k, key) then return v end
        end
    end
})

As you've suggested, a completely different option is to write a custom hashing function. Here's a HashTable that can make use of that:

local function HashTable(Hash, Equals)
    --Hash is an optional function that takes in any key and returns a key that lua can use (string or number). If you return false/nil, it will be assumed that you don't know how to hash that value.
    --    If Hash is not provided, table-keys should have a GetHash function or a .Hash field
    --Equals is an optional function that takes two keys and specify whether they are equal or not. This will be used when the same hash is returned from two keys.
    --    If Equals is not provided, items should have a Equals function; items are in this case assumed to not be equal if they are different types.
    local items = {} --Dict<hash, Dict<key, value>>
    local function GetHash(item)
        return Hash and Hash(item) or type(item) == "table" and (item.GetHash and item:GetHash() or item.Hash) or item
    end
    local function GetEquals(item1, item2)
        if Equals then return Equals(item1, item2) end
        local t1, t2 = type(item1), type(item2)
        if t1 ~= t2 then return false end
        if t1 == "table" and item1.Equals then
            return item1:Equals(item2)
        elseif t2 == "table" and item2.Equals then
            return item2:Equals(item1)
        end
        return false
    end
    return setmetatable({}, {
        __newindex = function(_, key, value)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then
                if value ~= nil then --Only generate a table if it will be non-empty after assignment
                    items[hash] = {[key] = value}
                end
                return
            end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then --Found the entry; update it
                    dict[k] = value
                    if value == nil then --check to see if dict is empty
                        if next(dict) == nil then
                            items[hash] = nil
                        end
                    end
                    return
                end
            end
            --This is a unique entry
            dict[key] = value
        end,
        __index = function(_, key)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then return nil end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then
                    return v
                end
            end
        end
    })
end

Usage example:

local h = HashTable(
    function(t) return t.a or 0 end, --Hash
    function(t1, t2) return t1.a == t2.a end) --Equals
h[{a=1}] = 1
print(h[{a=1}]) -- 1
h[{a=1}] = 2
print(h[{a=1}]) -- 2
print(h[{a=1,b=2}]) -- 2 because Hash/Equals only look at 'a'

Naturally, you'll want to get better Hash/Equals functions.

So long as the hashes of your keys rarely collide, this performance of this class should be O(1).

(Note: I'd have put the top half of this answer as a comment to kikito, but I don't have the reputation at this time to do so.)

回复收藏 0 原文

oО清风挽发oО 2025-01-10 18:41:33

这在 Lua 中是不可能的。如果您使用表作为键，则键是表的特定“实例”。即使您制作具有相同内容的不同表，实例也是不同的，因此它是不同的键。

如果您想做这样的事情，您可以创建一种哈希函数，它遍历表作为键（如果需要甚至可以递归）并构造表内容的字符串表示形式。它不需要是人类可读的，只要它对于不同的内容是不同的并且对于具有相同内容的表格是相等的即可。除了使用 pairs() 遍历表之外，您还需要将键插入表中并使用 table.sort() 对它们进行排序，因为 pairs() 以任意顺序返回它们，并且您希望“相等”表具有相同的字符串。

一旦构建了这样的字符串，您就可以将其用作键：

function hash(t) ... end
t = {}
key1 = { a = "a", b = "b" }
t[hash(key1)] = 4
key2 = { a = "a", b = "b" }
print(t[hash(key2)]) -- should print "4" if the hash function works correctly

在我看来，这对于简单的索引任务来说太复杂了，您可能需要重新考虑使用表副本进行索引的愿望。为什么你想要这样的功能？

更新

如果您只需要使用短语，我认为连接它们比创建此类通用哈希函数更容易。如果您需要它来表示短语序列，则实际上不需要遍历表并对键进行排序，只需收集每个短语的主要信息即可。您仍然需要使用辅助函数，它可以为您创建合适的密钥：

function pkey(...)
    local n, args = select('#', ...), { ... }
    for i=1,n do args[i] = args[i].value end -- extract your info here
    return table.concat(args, ' ') -- space or other separator, such as ':'          
end
tab[pkey(phrase1, phrase2, phrase3)] = "value"

This is not possible in Lua. If you use tables as keys, the key is that specific "instance" of the table. Even if you make a different table with the same contents, the instance is different, therefore it is a different key.

If you want to do something like this, you can create a kind of hash function, which traverses the table to serve as a key (maybe even recursively if needed) and construct a string representation of the table content. It does not need to be human-readable, as long as it is different for different content and equal for tables with the same content. Apart from using pairs() to traverse the table, you would also need to insert the keys into a table and sort them using table.sort(), because pairs() returns them in an arbitrary order, and you want the same string for "equal" tables.

Once you have constructed such string, you can use it as a key:

function hash(t) ... end
t = {}
key1 = { a = "a", b = "b" }
t[hash(key1)] = 4
key2 = { a = "a", b = "b" }
print(t[hash(key2)]) -- should print "4" if the hash function works correctly

In my opinion, this all is too complicated for the simple task of indexing, and you may want to re-think your wish for indexing using copies of tables. Why would you want such functionality?

Update

If you only need to work with phrases, I think that concatenating them is easier than creating such generic hash function. If you need it for sequences of phrases, you won't actually need to iterate through the tables and sort the keys, just collect the main information from each phrase. You would still need to use a helper function, which can create a suitable key for you:

function pkey(...)
    local n, args = select('#', ...), { ... }
    for i=1,n do args[i] = args[i].value end -- extract your info here
    return table.concat(args, ' ') -- space or other separator, such as ':'          
end
tab[pkey(phrase1, phrase2, phrase3)] = "value"

回复收藏 0 原文

笨笨の傻瓜 2025-01-10 18:41:33

我对语言处理以及您想要通过程序达到的目标了解不多，但是像这样收集标记怎么样：使用嵌套表结构，例如索引表仅存储由第一个短语标记索引的表，然后每个子表包含由第二个短语标记索引的值...等等...直到到达短语最后一个标记，将索引与该短语的出现相对应的数字值。

也许举个例子会更清楚，如果你有以下两个短语：

我喜欢香蕉。
我喜欢辣妹。

您的索引将具有以下结构：

index["I"] = {
    ["like"] = {
        ["banana"] = 1,
        ["hot"] = {
            ["chick"] = 1
        }
    }    
}

通过这种方式，您可以通过单个遍历步骤来计算频率，并在建立索引的同时计算出现次数，但正如我之前所说，这取决于您的目标是什么，这意味着重新分割您的短语，以便通过索引查找出现的情况。

I don't know a lot about Language Processing, and about the goal you want to reach with your program, but what about collecting token like this : use a nested table structure such has the index table store only tables indexed by first phrase token, then each subtables contains value indexed by second phrase token ... etc ... until you reach a phrase final token, will index an number value corresponding to he occurence of the phrase.

Maybe it will be more clear with a exemple, if you have the two following phrase :

I like banana.
I like hot chick.

Your index would have the following structure :

index["I"] = {
    ["like"] = {
        ["banana"] = 1,
        ["hot"] = {
            ["chick"] = 1
        }
    }    
}

In that way you can count frenquencies with a single traversal step, and count occurences at the same time you indexing, but as i said before, it depends of what is your goal, and it will imply to re - split you phrase so as to find occurences through your index.

回复收藏 0 原文

悲欢浪云 2025-01-10 18:41:33

我不确定你能做到这一点。您可以使用元表定义表的相等性，但无法定义哈希函数，而且我怀疑单独定义相等性是否能满足您的需要。显然，您可以定义相等性，然后使用pairs()迭代表并自己比较键，但这会将应该O(1)的查找变成O(n)。

回复收藏 0 原文

可可 2025-01-10 18:41:33

kikito 的答案有一个解决方案的开始，但是，chess123mate的回答指出，它是只写的（以及其他缺陷）。此解决方案并不能解决所有问题，但它是一个开始。（它也非常非常慢。）

local function equivalent(stack)
    local a, b = stack.a, stack.b

    if type(a) ~= 'table' then return a == b end
    if a == b then return true end

    local top = stack.next
    while top ~= nil do
        if stack.a == a and stack.b == b then return true end
        top = stack.next
    end

    local counta, countb = 0, 0

    for k,va in pairs(a) do
        if not equivalent({a = va, b = b[k], next = stack}) then return false end
        counta = counta + 1
    end

    for _,_ in pairs(b) do countb = countb + 1 end

    return counta == countb
end

t = setmetatable({}, {
    __newindex = function(tbl, key, value)
        for k, _ in pairs(tbl) do
            if equivalent({a = k, b = key}) then rawset(tbl, k, value); return end
        end
        rawset(tbl, key, value)
    end,
    __index = function(tbl, key)
        for k, v in pairs(tbl) do
            if equivalent({a = k, b = key}) then return v end
        end
    end
})

Lua Playground 链接（或 GitHub Gist 链接用于复制和粘贴到游乐场，如果你的浏览器讨厌我最后一个）。

kikito's answer has the beginnings of a solution but, as chess123mate's answer notes, it's write-only (among other flaws). This solution doesn't fix all of them, but it's a start. (It's also very, very slow.)

local function equivalent(stack)
    local a, b = stack.a, stack.b

    if type(a) ~= 'table' then return a == b end
    if a == b then return true end

    local top = stack.next
    while top ~= nil do
        if stack.a == a and stack.b == b then return true end
        top = stack.next
    end

    local counta, countb = 0, 0

    for k,va in pairs(a) do
        if not equivalent({a = va, b = b[k], next = stack}) then return false end
        counta = counta + 1
    end

    for _,_ in pairs(b) do countb = countb + 1 end

    return counta == countb
end

t = setmetatable({}, {
    __newindex = function(tbl, key, value)
        for k, _ in pairs(tbl) do
            if equivalent({a = k, b = key}) then rawset(tbl, k, value); return end
        end
        rawset(tbl, key, value)
    end,
    __index = function(tbl, key)
        for k, v in pairs(tbl) do
            if equivalent({a = k, b = key}) then return v end
        end
    end
})

Lua Playground link (or GitHub Gist link for copying and pasting into the playground, if your browser hates me for that last one).

回复收藏 0 原文

紧拥背影 2025-01-10 18:41:33

我认为做这样的事情最简单的方法是拥有一个返回 tableKey 工厂="nofollow noreferrer">输入表的只读表。

您可能希望分配一个 __key （或类似的）值用作键。类似于：

_cachedTableKeys = {}
function tableKey (t)
  if(not t.__key) t.__key = calculateKey(t)
  local existing = _cachedTableKeys[t.__key]
  if(existing) return existing
  existing = readOnly(t) -- https://www.lua.org/pil/13.4.5.html
  _cachedTableKeys[t.__key] = existing
  return existing 
end

这将确保用作键的表是只读的，并且如果相等，它的 id 将始终匹配。

注意：calculateKey 函数将对表键递归执行哈希函数。您希望使用足够大的哈希值（例如 16 个字节左右），以便几乎不可能发生冲突。您还可以在按键查找表时实现某种碰撞算法。

I think the simplest way to do something like this would be to have a tableKey factory which returns readonly tables for the inputted table.

You would want to assign a __key (or similar) value to use as the keys. Something like:

_cachedTableKeys = {}
function tableKey (t)
  if(not t.__key) t.__key = calculateKey(t)
  local existing = _cachedTableKeys[t.__key]
  if(existing) return existing
  existing = readOnly(t) -- https://www.lua.org/pil/13.4.5.html
  _cachedTableKeys[t.__key] = existing
  return existing 
end

This will ensure that the table being used as the key is readonly and that it's id will always match if it is equal.

Note: the calculateKey function would perform a hash function recursively on the table keys. You want to use a large enough hash (like 16 bytes or so) so that collision is near impossible. You could also implement some kind of collision algorithm when looking up the table by key.