是否有 C++ 的版本? STL 的关联数据结构针对大量部分副本进行了优化?
我有一棵大树,它随着我的算法的进展而生长。每个节点都包含集合,我认为它是作为平衡二叉搜索树实现的。每个节点的集合在该节点创建之后、用于创建该节点的子节点之前应保持固定。
然而,我担心复制每套的成本会高得令人望而却步。相反,我希望每个新创建的节点集都利用父节点集的所有适当部分。简而言之,我很高兴复制该集合的 O(log n),但不是 O(n)。
STL 关联数据结构是否有任何变体可以提供此类部分复制优化?也许在Boost中?当然,这样的数据结构在 Haskell 或 OCaML 中实现起来很简单,但在 C++ 中则需要更多的努力。
I have a large tree that grows as my algorithm progresses. Each node contains set, which I suppose is implemented as balanced binary search tree. Each node's set shall remain fixed after that node's creation, before its use in creating that node's children.
I fear however that copying each set is prohibitively expensive. Instead, I would prefer that each newly created node's set utilize all appropriate portions of the parent node's set. In short, I'm happy copying O(log n) of the set but not O(n).
Are there any variants of the STL's associative data structures that offer such an partial copy optimization? Perhaps in Boost? Such a data structure would be trivial to implement in Haskell or OCaML of course, but it'd require more effort in C++.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我知道建议一种不同的语言通常不会有成效,但 Haskell 的标准容器库正是这样做的。我记得看过一个视频(是 Simon Peyton Jones?),讨论了这个确切的问题,以及对于给定程序员的努力,Haskell 解决方案最终如何比 C++ 解决方案快得多。当然,这是针对一个特定问题,该问题有很多包含很多共享元素的集合。
关于这个主题有相当多的研究。如果您正在寻找关键字,我建议搜索“函数式数据结构”而不是“不可变数据结构”,因为大多数函数式范例通常受益于不变性。像手指树这样的结构正是为了解决这个问题而开发的。
据我所知,没有 C++ 库可以实现这些数据结构。没有什么可以阻止您阅读相关论文(或 Haskell 源代码,
Data.Set
大约有 1k 行,包括测试)并自己实现它,但我知道这不是您想要的我想听。您还需要对共享节点进行某种引用计数,对于如此深的结构,这可能比简单的垃圾收集器具有更高的开销。I know it's not generally productive to suggest a different language, but Haskell's standard container libraries do exactly this. I remember seeing a video (was it Simon Peyton Jones?) talking about this exact problem, and how a Haskell solution ended up being much faster than a C++ solution for the given programmer effort. Of course, this was for a specific problem that had a lot of sets with a lot of shared elements.
There is a fair amount of research into this subject. If you are looking for keywords, I suggest searching for "functional data structures" instead of "immutable data structures", since most functional paradigms benefit from immutability in general. Structures such as finger tree were developed to solve exactly this problem.
I know of no C++ library that implements these data structures. There is nothing stopping you from reading the relevant papers (or the Haskell source code, which is about 1k lines for
Data.Set
including tests) and implementing it yourself, but I know that is not what you'd want to hear. You'd also need do some kind of reference counting for the shared nodes, which for such deep structures can have a higher overhead than even simple garbage collectors.这在 C++ 中几乎是不可能的,因为不可变的概念
容器不存在。你可能知道你不会做任何改变,
并且某种形式的共享表示会更好,但是
编译器和库没有,并且没有办法传达这一点
给他们。
It's practically impossible in C++, since the notion of an immutable
container doesn't exist. You may know that you'll be making no changes,
and that some sort of shared representation would be preferable, but the
compiler and the library don't, and there's no way of communicating this
to them.
这是一个非常独特的案例。我建议使用 std::vector 代替。 (不是真的!)代码创建的节点仍然可以使用
set
,并在最后一秒切换到vector
。然而,向量
更小,并且只需要少量的内存分配(如果您使用reserve
,则一次),使得算法快得多 。您甚至可以使用 std::lower_bound 和 std::upper_bound 仍然获得 O(log(n)) 内存访问,因为它的排序方式仍然与集合相同,这样你就不会损失任何效率。只要您不需要频繁插入/移除,这就是纯粹的增益。
如果这种恐惧是由于每个集合包含与其父节点几乎相同的节点而引起的,并且数据成本高昂(复制或在内存中,以两者为准),并且仅更改了少数节点,则使子树包含 std::shared_pointers 而不是数据本身。这意味着数据本身不会被复制,只会复制指针。
我意识到这不是你提出这个问题的目的,但正如 JamesKanze 所说,我知道没有这样的容器。除了可能对 STL 的
rope
类进行奇怪且危险的使用之外。请注意,我说的是 STL,而不是标准 C++ 库。他们是不同的。That's a pretty unique case. I would recommend using std::vector instead. (No really!) The code is creating the node can still use a
set
, and switching to avector
at the last second. However, thevector
is smaller, and takes only a tiny number of memory allocations (one if you usereserve
), making the algorithm much faster.You can even use
std::lower_bound
andstd::upper_bound
to still get O(log(n)) memory accesses since it's still sorted the same as the set was, so you won't lose any efficiency. It's pure gain as long as you don't need to insert/remove frequently.If this fear is caused because each set contains mostly the same nodes as it's parents, and the data is costly (to copy or in memory, whichever), with only a few nodes changed, make the subtrees contain
std::shared_pointers
instead of the data themselves. This means the data itself will not get copied, only the pointers.I realize this isn't what you were aiming at with the question, but as JamesKanze said, I know of no such container. Other than possibly a bizarre and dangerous use of the STL's
rope
class. Note that I said and meant STL, not the standard C++ library. They're different.