具有写时复制功能的纯函数式数据结构?
我希望拥有功能数据结构(可以共享结构的多个版本的数据)的优势,但能够以命令式方式修改它。
我正在考虑的(以及可能的用途):一款存储整个游戏历史的 RPG 游戏(例如,允许时光倒流)。使用写时复制,我可以简单地克隆保存游戏状态的结构,并通过引入新回合来修改它 - 但可以访问较早的回合(不一定是所有回合,可能只是选定的游戏状态快照),而无需每次都必须复制所有内容的惩罚。
假设 foo
是一张地图。
bar = foo.clone()
foo
的结构(例如树)尚未被复制。然而, 从现在开始 bar
被视为副本,不允许传播任何更改 回到“foo”。
baz = bar[someKey]
baz.modifyInSomeWay()
现在
- 创建了一个新对象,它是
baz
的修改副本。 bar
被替换为新的地图,包含新的baz
(可能保留 一些foo
的结构)。foo
不受影响。
但如果我们这样做...
baz.modifyAgain()
...baz
就可以修改,因为我们有它的最新版本。 栏
不需要改变。
这一切都需要保存一些关于 foo 和 bar 的版本信息, 在 foo.clone()
上增加它,并以某种方式将其传递给 baz
(通过使其 代理对象?)。
此外,被克隆的结构的任何部分都将成为“历史的一部分” 并且不能再更改,这可以在运行时强制执行。
这有点类似于 JavaScript 的原型,但我更喜欢它,因为它允许更改 向上传播。我认为这就像一个版本控制系统。
- 这已经做到了吗?做到什么程度了?
- 这是个好主意吗?如果没有的话有办法挽救吗?
- 如何实施?我正在考虑将其构建在某种高级 GC 语言之上,例如 Python。
I want to have the advantage of functional data structures (multiple versions of data that can share structure) but be able to modify it in an imperative style.
What I'm thinking about (and a possible use): a RPG game in which whole game history is stored (for example, to allow for travelling back in time). Using copy-on-write, I could simply clone the structure holding game state and modify it by introducing a new turn - but have access to the earlier turns (not necessarily all of them, maybe just selected snapshots of game state), without the penalty of having to copy everything every time.
Let's say foo
is a map.
bar = foo.clone()
Nothing of foo
's structure (for example, tree) gets copied yet. However,
from now on bar
is treated as a copy and no changes are allowed to propagate
back to `foo'.
baz = bar[someKey]
baz.modifyInSomeWay()
Now
- a new object gets created, that is a modified copy of
baz
. bar
is replaced with a new map, holding newbaz
(possibly retaining
some offoo
's structure).foo
is unaffected.
But if we then do...
baz.modifyAgain()
...baz
can be just modified, because we have a recent version of it. bar
doesn't need to be changed.
All this requires holding some version information about foo
and bar
,
increasing it on foo.clone()
, and passing it to baz
somehow (by making it
a proxy object?).
Also, any part of structure that has been cloned becomes a 'part of history'
and can't be changed anymore, which could be enforced at run-time.
This resembles JavaScript's prototypes a bit, but I it's more as it allows for changes
to propagate upwards. I think it would be something like a version control system.
- Has this been done, and to what extent?
- Is this a good idea? If not, is there a way to save it?
- How could it be implemented? I was thinking about building it on top of some high-level GC-ed language, like Python.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
功能(“持久”)数据结构通常是由不可变节点递归构建的(例如共享公共后缀的单链表、搜索树或堆,其中仅需要复制从根到更新项的路径上的树结构部分)。
每次修改都必须复制整个集合的任何做法都是不好的。在这些情况下,您倾向于通过递归到以前的版本来覆盖检查的小“差异”(需要额外的时间)。通常,当差异变得太大时,您可以进行深度复制/重建(因此摊销成本并没有那么糟糕)。
持久性数据结构可能会产生很大的开销,但是如果您对小对象进行非常有效的分配(JVM GC 合格),那么它们可能非常实用 - 在最好的情况下,与可变等效结构一样快,仅以牺牲持久性为代价使用的内存 - 可能比每次更新时的完整副本少得多。
根据您的语言,您可能会发现如果不重载赋值操作符,您想要的语法将很难实现。 C++ 中的左值(可变引用)肯定需要非持久语义。
Functional ("persistent") data structures are typically recursively built up out of immutable nodes (e.g. singly linked list where common suffixes are shared, search tree or heap where only the parts of the tree structure that are on the path from the root to the updated item need copying).
Anything where the entire set has to be copied for every modification is bad. In those cases, you'd tend to overlay small "diffs" that are checked (taking extra time) with recursion to previous versions. Every so often, when the diffs become too large, you could do a deep copy/rebuild (so the amortized cost is not as bad).
Persistent data structures can have significant overhead, but if you have very efficient allocation of small objects (JVM GC qualifies), they can be very practical - in the best case, equally as fast as the mutable equivalent, giving persistence only at the cost of the memory used - which can be much less than a full copy on every update.
Depending on your language, you'll probably find the syntax you want difficult to achieve without operator overloading for assignment. Lvalues (mutable references) in C++ definitely require non-persistent semantics.
1.这已经完成了吗?达到什么程度?
是的,请参阅例如 qt5
2.这是个好主意吗?如果没有,有办法保存吗?
是的,这是一个好主意。提出的替代方案之一是使用完全不可变的数据结构(包装在命令式接口中),但问题是即使对象是唯一指向数据的对象,修改操作也会创建数据的副本(没有就地更新),这样效率很低。使用写时复制方法,仅在第一次修改时进行复制,后续修改会就地更改复制的数据(当然,如果未创建对相同数据的另一个引用)。
3.如何实施?我正在考虑在一些高级 GC 语言之上构建它,例如 Python。
一种方法是使用引用计数,例如在 qt 中(请参阅描述 此处)。要实现它需要赋值运算符重载或显式方法调用(例如 bar = foo.clone() ),但它可能很脆弱,如果有人忘记调用clone 会发生什么code> 并执行
bar = foo
?),这样就可以保留计数。正如您所说,创建代理对象的其他可能性。例如,请参阅 pycow (Python 实现)。
1. Has this been done, and to what extent?
Yes, see for example qt5 implicit sharing.
2. Is this a good idea? If not, is there a way to save it?
Yes, it is a good idea. One of the proposed alternatives is to use a fully immutable data structure (wrapped in a imperative interface), but the problem is that even if a object is the only one to point to a data, a modification operation will create a copy of the data (there is no in place update), this is inefficient. Using the copy on write approach, a copy is made only in the first modification, subsequent modifications alters the copied data in place (if another reference to the same data wasn't created, off course).
3. How could it be implemented? I was thinking about building it on top of some high-level GC-ed language, like Python.
One way is to use reference counting, like in qt (see a description here). To implement it will requires either assignment operator overloading or a explicit method call (like
bar = foo.clone()
, but it may be fragile, what happens if someone forget to callclone
and just dobar = foo
?), so the counting can be kept.Other possibility in tho create a proxy object, like you said. See for example a pycow (a python implementation).
与仅拥有一个完全不可变的数据结构,然后使用一个包装器来保存对其的引用并公开一个通过更新包装版本来工作的命令式接口相比,这听起来非常复杂且容易出错。
例如,在 Scala 中,
这当然意味着您必须制作两个版本的界面,但语义要清晰得多。
This sounds awfully convoluted and error prone compared to just having a fully immutable data structure and then using a wrapper which holds a reference to it and exposes an imperative interface which works by updating the wrapped version.
e.g. in Scala
Sure it means you have to make two versions of the interface, but the semantics are a lot saner.