通过键和索引进行高效操作和检索的数据结构

发布于 2024-12-09 19:32:12 字数 435 浏览 1 评论 0原文

我正在寻找具有例如功能的数据结构。 .NET 中的 OrderedDictionary,即一个关联集合(即将关联的集合),它维护元素顺序(就像普通的 List 一样)。

它必须能够通过索引和键进行快速查找。它还应该具有快速的“追加”操作(在末尾插入新项目),以及快速删除具有任何索引的项目(基于索引或键)。

如果我没记错的话,.NET 中的 OrderedDictionary 使用哈希表和数组来存储其项目。因此,基于键检索索引(反之亦然)的时间复杂度为 O(n),当然,从数组中间删除项目的时间复杂度为 O(n) > 首先,如果按键删除,则添加从键中添加索引的查找。

我的问题是是否存在更有效的数据结构来满足我的条件,或者这确实是我的最佳选择?

I'm looking for a data structure with the functionality of eg. the OrderedDictionary in .NET, that is to say an associative collection (i.e. one that associates a key with a value) that maintains element order (just like a normal List does).

It must have fast lookup by both index and key. It should also have a fast "append" operation (inserting a new item at the end), and fast removal of items with any index (based on either index or key).

The OrderedDictionary in .NET uses both a hash table and an array to store its items if I'm not mistaken. Retreiving an index based on a key (or vice versa) is therefore O(n), and of course removal of an item from the middle of an array is O(n) to start with, plus the added lookup of the index from the key if removing by key.

My question is if there exists a more efficient data structure that satisfies my conditions, or if this is indeed my best option here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

彡翼 2024-12-16 19:32:12

我认为你可以用两棵红黑树来做到这一点:一个键查找树用于存储按比较函数排序的键,另一个是索引查找树,其中键按任意顺序排列,就像在列表中一样。每个索引查找节点必须有一个“大小”字段 - 如果每个节点中都包含“大小”字段,则红黑树可以通过索引进行查找。例如,请参阅 C5 通用集合库中的 RedBlackTreeSet 实现。

键查找树中的每个条目都需要一个指向索引查找树中相应条目的指针。除了左节点和右节点指针外,索引查找树还需要一个父指针字段以允许从下到上- 顶部导航以及从上到下的导航。

总共,每个键需要六个指针:两个节点中通常的左指针和右指针,加上从键查找节点到索引查找节点的指针,加上每个索引查找节点中的父指针-节点。您还需要每个节点中都有一个指针来指向存储的值。

操作:

追加 - 追加操作会将键插入到两棵树中 - 一次插入键查找树中由比较函数确定的位置,然后再次插入索引查找树的最右侧位置。插入红黑树是一个对数时间操作。

按键查找 - 这是在键查找树上完成的,使用比较函数找到正确的位置 - O(log(n))

按索引查找 - 这可以在索引查找字段上完成,如上所述 - O(log(n))

从键获取索引 - 首先在键查找树中查找键 O(log(n))。跟随指针到达索引查找树。沿着父指针向上到达根节点(平衡树的 O(log(n)))。使用向上的“大小”字段来确定键的索引。 - 总体 O(log(n))。

按索引删除 - 在索引查找树中查找项目。从索引查找树中删除。在键查找树中查找找到的键。从键查找树中删除。所有操作都是 O(log(n)) ,因此删除总体上是 O(log(n)) 。

按键删除 - 使用“从键获取索引”来获取键的索引。从索引查找树中按索引删除。从键查找树中按键删除。总体而言,O(log(n))。

该结构还支持在任意位置插入 O(log(n)),而不仅仅是在末尾。

存储开销显然相当大,但仍然是 O(n)。时间复杂度满足所有要求。

不幸的是我不知道这个结构的任何实现。

更新:我觉得你可以将树与哈希表结合起来以获得 O(1) 按键查找。正如我上面建议的那样,不要使用两棵树,而是使用哈希表进行按键查找,使用平衡顺序统计树进行按位置查找,如上所述,但哈希表的槽包含指向用于执行按键获取列表位置查找的平衡树的节点。按键查找现在为 O(1),其他所有内容平均保持为 O(ln(n))。当然,与任何哈希表一样,您现在偶尔会遇到 O(n) 重新哈希惩罚。

I think you can do this with two red-black trees: A key-lookup tree to store the keys ordered by a compare function, and an index-lookup tree, with the keys in arbitrary ordering, as in a list. Each index-lookup node must have a 'size' field - a Red-Black tree can do lookup by index if a 'size' field is included in each node. See for example the RedBlackTreeSet implementation in the C5 Generic Collection Library.

Each entry in the key-lookup tree needs a pointer across to its corresponding entry in the index-lookup tree.As well as left- and right-node pointers, the index-lookup tree will require a parent pointer field to allow bottom-to-top navigation as well as top-to-bottom.

In all, six pointers are required for each key: the usual left and right pointers in both nodes, plus the pointer from the key-lookup-node to the index-lookup-node, plus the parent pointer in each of the index-lookup-nodes. You would also need a pointer in each node to point to the stored value.

Operations:

Append - An append operation would insert the key into both trees - once in the key-lookup tree, at a position determined by the compare function, and again in the rightmost position of the index-lookup tree. Insertion into a red-black tree is a logarithmic time operation.

Lookup by key - this is done on the key-lookup tree, using the compare function to find the correct position - O(log(n))

Lookup by index - this can be done on the index-lookup field, as mentioned above - O(log(n))

Get index from key - first lookup the key in the key-lookup tree O(log(n)). Follow the pointer across to the index-lookup tree. Follow the parent pointers up to the root node, (O(log(n)) for a balanced tree). Use the 'size' fields on the way up to determine the index of the key. - O(log(n)) overall.

Delete by index - lookup the item in the index-lookup tree. Delete from index-lookup tree. Lookup the located key in the key-lookup tree. Delete from the key-lookup tree. All operations are O(log(n)) , so delete is O(log(n)) overall.

Delete by key - use 'Get index from key' to get the index of the key. Delete by index from the index-lookup tree. Delete by key from the key-lookup tree. O(log(n)) overall.

This structure also supports O(log(n)) insertion at any arbitrary position, not just at the end.

Storage overhead is obviously considerable, but remains O(n). Time complexity meets all the requirements.

Unfortunately I am not aware of any implementation of this structure.

Update: It occurs to me you can combine a tree with a hash table to get O(1) by-key lookup. Instead of having two trees, as I suggest above, use a hash table for the by-key lookup, and a balanced order-statistics tree for the by-position lookup, as above, but have the slots of the hash table contain pointers to the nodes of the balanced tree for doing the get-list-position-by-key lookup. By-key lookups are now O(1), with everything else staying O(ln(n)) on average. Of course, you now get the occasional O(n) re-hash penalty, as with any hash-table.

冷清清 2024-12-16 19:32:12

OrderedDictionary 实际上满足了您的要求。

您对 OrderedDictionary 的分析不正确。根据这个

即使是简单的分析也可以通过键或索引进行 O(1) 查找。数组提供 O(1) 访问,哈希表提供有效的 O(1) 访问。

插入/删除稍微复杂一些,但考虑到摊销分析仍然是 O(1)

文章声称插入和删除是 O(n) 。这至少不适用于插入,因为摊销分析允许您简单地将插入给定元素的“成本”从 1 增加到 2。当插入需要调整数组大小的元素时,成本的后半部分用于支付复制的成本。最终插入将花费更长的时间,但它仍然是 O(1) 摊销的,并且只有当您导致数组大小调整时才会出现差异,而这种情况不太可能发生。

OrderedDictionary meats your requirements actually.

Your analysis of the OrderedDictionary is incorrect. It is actually O(1) for key based look up and O(1) for index according to this.

Even simple analysis gives you O(1) lookup either by key or index. Arrays provide O(1) access and hash tables provide effectively O(1) access.

Insert/ Delete is a litte more complicated, but given amortized analysis is still O(1)

The article claims that it is O(n) for insert and delete. This at least does not hold for insert since amortized analysis allows you to simply up the "cost" of inserting a given element from 1 to 2. When inserting an element that requires an array resize, the second half of the cost is used to pay the cost of copying. The final insert will take longer, but its still O(1) amortized and the discrepancy only shows up if you cause an array resize which is unlikely.

本王不退位尔等都是臣 2024-12-16 19:32:12

Maybe you'll find something interesting here in The C5 Generic Collection Library for C# (from page 233)

泪冰清 2024-12-16 19:32:12

您可以像链接一样使用平衡二叉搜索树 ,只是在 TreeNode 的定义中你应该添加你的键,但问题是通过键和索引查找元素不是 O(1),而是 O(log(n)) (事实上索引不是 TreeNode 的一部分,相对而言)可以找到)但所有操作是 O(log(n)),并且是基于比较方法的已知最快方法。

You can use Balanced binary search tree like the link, just for in definition of TreeNode you should add your keys, but the problem is finding element is not O(1), it's O(log(n)) both by keys and index (in fact index is not part of TreeNode, relatively can be found) but all operations are O(log(n)) and is fastest known way based on comparison methods.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文