什么是写时复制?
我想知道写时复制是什么以及它的用途。 Sun JDK 教程中多次提到该术语。
I would like to know what copy-on-write is and what it is used for. The term is mentioned several times in the Sun JDK tutorials.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
它也在 Ruby“企业版”中用作节省内存的巧妙方法。
It's also used in Ruby 'Enterprise Edition' as a neat way of saving memory.
我本来打算写下我自己的解释,但这篇维基百科文章几乎总结了它向上。
这是基本概念:
这里还有一个常见的 COW 应用程序:
I was going to write up my own explanation but this Wikipedia article pretty much sums it up.
Here is the basic concept:
Also here is an application of a common use of COW:
“写入时复制”或多或少意味着它听起来像这样:每个人都拥有相同数据的单个共享副本直到写入,然后创建一个副本。 通常,写时复制用于解决并发类问题。 例如,在 ZFS 中,磁盘上的数据块是按写时复制分配的; 只要没有改变,就保留原来的块; 更改仅更改了受影响的块。 这意味着分配的新块的最小数量。
这些更改通常也实现为事务,即它们具有ACID 属性。 这消除了一些并发问题,因为这样就可以保证所有更新都是原子的。
"Copy on write" means more or less what it sounds like: everyone has a single shared copy of the same data until it's written, and then a copy is made. Usually, copy-on-write is used to resolve concurrency sorts of problems. In ZFS, for example, data blocks on disk are allocated copy-on-write; as long as there are no changes, you keep the original blocks; a change changed only the affected blocks. This means the minimum number of new blocks are allocated.
These changes are also usually implemented to be transactional, ie, they have the ACID properties. This eliminates some concurrency issues, because then you're guaranteed that all updates are atomic.
我不会在写时复制上重复相同的答案。 我认为安德鲁的回答和查理的回答答案已经说得很清楚了。 我将给你举一个操作系统世界的例子,只是为了说明这个概念的使用有多广泛。
我们可以使用
fork()
或vfork()
来创建一个新进程。 vfork 遵循写时复制的概念。 例如,vfork创建的子进程会与父进程共享数据和代码段。 这加快了分叉时间。 如果您先执行 exec,然后执行 vfork,则预计会使用 vfork。 因此 vfork 将创建子进程,该子进程将与其父进程共享数据和代码段,但是当我们调用 exec 时,它将在子进程的地址空间中加载新可执行文件的映像。I shall not repeat the same answer on Copy-on-Write. I think Andrew's answer and Charlie's answer have already made it very clear. I will give you an example from OS world, just to mention how widely this concept is used.
We can use
fork()
orvfork()
to create a new process. vfork follows the concept of copy-on-write. For example, the child process created by vfork will share the data and code segment with the parent process. This speeds up the forking time. It is expected to use vfork if you are performing exec followed by vfork. So vfork will create the child process which will share data and code segment with its parent but when we call exec, it will load up the image of a new executable in the address space of the child process.只是提供另一个示例, Mercurial使用写时复制使克隆本地存储库成为真正“廉价”的操作。
其原理与其他示例相同,只不过您讨论的是物理文件而不是内存中的对象。 最初,克隆并不是复制品,而是指向原始版本的硬链接。 当您更改克隆中的文件时,会写入副本以代表新版本。
Just to provide another example, Mercurial uses copy-on-write to make cloning local repositories a really "cheap" operation.
The principle is the same as the other examples, except that you're talking about physical files instead of objects in memory. Initially, a clone is not a duplicate but a hard link to the original. As you change files in the clone, copies are written to represent the new version.
设计模式:可重用面向对象软件的元素一书埃里希·伽玛等人。 清楚地描述了写时复制优化(“后果”部分,“代理”一章):
下面是使用 代理模式 进行写时复制优化的 Python 实现。 此设计模式的目的是为另一个对象提供代理来控制对其的访问。
代理模式的类图:
代理模式的对象图:
首先我们定义主体的接口:
接下来我们定义真正的主体实现主题接口:
最后,我们定义实现主题接口并引用真实主题的代理:
然后,客户端可以通过使用代理作为真实主题的替代品,从写时复制优化中受益:
The book Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma et al. clearly describes the copy-on-write optimization (section ‘Consequences’, chapter ‘Proxy’):
Here after is a Python implementation of the copy-on-write optimization using the Proxy pattern. The intent of this design pattern is to provide a surrogate for another object to control access to it.
Class diagram of the Proxy pattern:
Object diagram of the Proxy pattern:
First we define the interface of the subject:
Next we define the real subject implementing the subject interface:
Finally we define the proxy implementing the subject interface and referencing the real subject:
The client can then benefit from the copy-on-write optimization by using the proxy as a stand-in for the real subject:
我发现这篇关于PHP中zval的好文章,其中提到了COW也:
I found this good article about zval in PHP, which mentioned COW too:
Git 就是一个很好的例子,它使用一种策略来存储 blob。 为什么它使用哈希值? 部分原因是这些更容易执行差异,而且还因为可以更简单地优化 COW 策略。 当您进行新的提交并更改少量文件时,绝大多数对象和树都不会更改。 因此提交时,会通过哈希值组成的各种指针引用一堆已经存在的对象,使得存储整个历史记录所需的存储空间小得多。
A good example is Git, which uses a strategy to store blobs. Why does it use hashes? Partly because these are easier to perform diffs on, but also because makes it simpler to optimise a COW strategy. When you make a new commit with few files changes the vast majority of objects and trees will not change. Therefore the commit, will through various pointers made of hashes reference a bunch of object that already exist, making the storage space required to store the entire history much smaller.
这是一个内存保护的概念。 在此编译器中创建额外的副本来修改子级中的数据,并且此更新的数据不会反映在父级数据中。
It is a memory protection concept. In this compiler creates extra copy to modify data in child and this updated data not reflect in parents data.