c++ 中的持久数据结构
C++ 中是否有类似于 Clojure 中的持久数据结构实现?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
C++ 中是否有类似于 Clojure 中的持久数据结构实现?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
我自己推出了,但有
immer
库 作为一个相当全面的示例,它是特别受到 Clojure 的启发。几年前,在听了约翰·卡马克 (John Carmack) 的演讲后,我非常兴奋,并推出了自己的产品,当时他在函数式编程潮流中大放异彩。他似乎能够想象一个围绕不可变数据结构旋转的游戏引擎。虽然他没有透露具体细节,而且这在他的脑海中似乎只是一个模糊的想法,但事实上他正在认真考虑它并且似乎并不认为开销会急剧降低帧速率,这足以让我兴奋关于探索这个想法。我实际上将它用作某种优化细节,这可能看起来很矛盾(不变性有开销),但我的意思是在特定的上下文中。如果我绝对想这样做:
...并且我绝对不希望该函数产生副作用,以便它可以是线程安全的并且永远不会被误用,那么我别无选择,只能复制巨大的数据结构(可能跨越千兆字节)。
我发现在这种情况下拥有一个小型的不可变数据结构库非常有帮助,因为它通过浅层复制和引用未更改的部分使上述场景相对便宜。也就是说,我大多只使用一个不可变的数据结构,它基本上是一个随机访问序列,如下所示:
正如其他人提到的,它确实需要一些小心、调整和全面的测试以及许多 VTune 会话才能使其线程安全且高效,但是在我投入了苦力之后,它确实使事情变得更加容易。
每当我们使用这些结构来编写没有副作用的函数时,除了自动线程安全之外,您还可以获得诸如非破坏性编辑、简单的撤消系统、简单的异常安全(不需要通过作用域防护来回滚副作用)之类的东西在一个不会导致异常路径的函数中),并让用户复制和粘贴数据并实例化它,而无需占用太多内存,直到/除非他们修改了粘贴的内容作为奖励。事实上,我发现这些好处在日常工作中比线程安全更有用。
我使用“瞬态”(又名“构建器”)作为表达数据结构更改的方式,如下所示:
我什至有一个不可变的图像库,用于图像编辑以简化非破坏性编辑。它使用与上述结构类似的策略,将图像视为图块,如下所示:
当修改瞬态并且我们获得新的不可变值时,只有更改的部分才会变得唯一。其余图块是浅复制的(仅 32 位索引):
我确实在性能相当关键的领域(例如网格和视频处理)中使用了它们。对于每个块应存储多少数据进行了一些微调(太多,我们浪费处理和内存深复制太多数据,太少,我们浪费处理和内存浅复制太多带有更频繁线程锁的指针)。
我不将它们用于光线追踪,因为这是可以想象到的最极端的性能关键领域之一,并且用户可以注意到最微小的开销(他们实际上进行基准测试并注意到 2% 范围内的性能差异),但大多数时间,它们足够高效,并且当您可以将这些巨大的数据结构作为一个整体左右复制以简化线程安全、撤消系统、非破坏性编辑等时,这是一个相当棒的好处,而不必担心爆炸性的内存使用并且明显的延迟花费在深度复制所有内容上。
I rolled my own but there's the
immer
library as a fairly comprehensive example and it is specifically inspired by clojure. I got all excited and rolled my own a few years back after listening to a speech by John Carmack where he was jumping all over the functional programming bandwagon. He seemed to be able to imagine a game engine revolving around immutable data structures. While he didn't get into specifics and while it just seemed like a hazy idea in his head, the fact that he was seriously considering it and didn't seem to think the overhead would cut steeply into frame rates was enough to get me excited about exploring that idea.I actually use it as somewhat of an optimization detail which might seem paradoxical (there is overhead to immutability), but I mean in a specific context. If I absolutely want to do this:
... and I absolutely don't want the function to cause side effects so that it can be thread-safe and never prone to misuse, then I have no choice but to copy the huge data structure (which might span a gigabyte).
And there I find having a small library of immutable data structures immensely helpful in such a context, since it makes the above scenario relatively cheap by just shallow copying and referencing unchanged parts. That said, I mostly just use one immutable data structure which is basically a random-access sequence, like so:
And as others mentioned, it did require some care and tuning and comprehensive testing and many VTune sessions to make it thread-safe and efficient, but after I put in the elbow grease, it definitely made things a whole, whole lot easier.
On top of automatic thread safety whenever we use these structures to write functions to be free of side effects, you also get things like non-destructive editing, trivial undo systems, trivial exception-safety (no need to roll back side effects through scope guards in a function which causes none in exceptional paths), and letting users copy and paste data and instance it without taking much memory until/unless they modify what they paste as a bonus. I actually found these bonuses to be even more useful on a daily basis than the thread safety.
I use 'transients' (aka 'builders') as a way to express changes to the data structure, like so:
I even have an immutable image library which I use for image editing to trivialize non-destructing editing. It uses a similar strategy to the above structure by treating images as tiles like so:
When a transient is modified and we get a new immutable, only the changed parts are made unique. The rest of the tiles are shallow copied (just 32-bit indices):
I do use these in fairly performance-critical areas like mesh and video processing. There was a bit of fine-tuning as to how much data each block should store (too much and we waste processing and memory deep copying too much data, too little and we waste processing and memory shallow copying too many pointers with more frequent thread locks).
I don't use these for raytracing since that's one of the most extreme performance-critical areas imaginable and the tiniest bit of overhead is noticeable to users (they actually benchmark and notice performance differences in the range of 2%), but most of the time, they're efficient enough, and it's a pretty awesome benefit when you can copy these huge data structures around as a whole left and right to simplify thread safety, undo systems, non-destructive editing, etc, without worrying about explosive memory usage and noticeable delays spent deep copying everything.
获得持久数据结构的主要困难确实是缺乏垃圾收集。如果您没有适当的垃圾收集方案,那么您可能会得到一个糟糕的方案(即参考计数),但这意味着您需要格外小心,不要创建循环引用。它改变了结构的核心。例如,考虑二叉树。如果您创建节点的新版本,则需要其父节点的新版本才能访问它(等等...)。现在,如果关系是双向的(子<->父),那么您实际上将复制整个结构。这意味着你要么有一个父母 ->子女关系,或相反(不太常见)。
我可以考虑实现二叉树,或者 B 树。例如,我几乎不知道如何获得正确的数组。
另一方面,我同意在多线程环境中拥有高效的环境会很棒。
The main difficulty to get a persistent data-structure is indeed the lack of garbage collection.If you don't have a proper garbage collection scheme, then you can get a poor one (namely reference counting), but this mean that you need to take extra care NOT to create cyclic references.It changes the very core of the structure. For example, think binary tree. If you create a new version of a node, then you need a new version of its parent to access it (etc...). Now, if the relation is two-way (child <-> parent) then you will in fact duplicate the whole structure. This means that you will either have a parent -> child relation, or the opposite (less common).
I can think of implementing a binary tree, or a B-Tree. I hardly see how to get a proper array for example.
On the other hand, I agree it would be great to have efficient ones in multi-threaded environments.
如果我正确地理解了这个问题,那么您所寻求的是复制对象的能力,而无需在完成时实际支付复制费用,而仅在需要时才进行。对任一对象的更改都可以在不损坏另一个对象的情况下完成。
这称为“写入时复制”。
如果这就是您正在寻找的,那么可以使用共享指针在 C++ 中相当轻松地实现(请参阅 Boost 中的 shared_ptr,作为一种实现)。
最初,副本将与源共享所有内容,但是一旦进行更改,对象共享指针的相关部分就会被指向新创建的深度复制对象的其他共享指针所取代。
(我意识到这个解释是模糊的 - 如果这确实是你的意思,答案可以详细说明)。
If I understand the question correctly, what you seek is the ability to duplicate an object without actually paying for the duplication when it is done, only when it is needed. Changes to either object can be done without damaging the other one.
This is known as "copy on write".
If this is what you are looking for, this can fairly easily be implemented in C++ using shared pointers (see shared_ptr from Boost, as one implementation).
Initially the copy would share everything with the source, but once changes are made the relevant parts of the object shared pointers are replaced by other shared pointers pointing to newly created, deep-copied objects.
(I realize that this explanation is vague - if this is indeed what you mean, the answer can be elaborated).