什么是写时复制?

发布于 2024-07-14 08:53:30 字数 55 浏览 10 评论 0原文

我想知道写时复制是什么以及它的用途。 Sun JDK 教程中多次提到该术语。

I would like to know what copy-on-write is and what it is used for. The term is mentioned several times in the Sun JDK tutorials.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

帝王念 2024-07-21 08:53:31

它也在 Ruby“企业版”中用作节省内存的巧妙方法。

It's also used in Ruby 'Enterprise Edition' as a neat way of saving memory.

夏末的微笑 2024-07-21 08:53:30

我本来打算写下我自己的解释,但这篇维基百科文章几乎总结了它向上。

这是基本概念:

写入时复制(有时称为“COW”)是计算机编程中使用的一种优化策略。 基本思想是,如果多个调用者请求最初无法区分的资源,您可以为他们提供指向同一资源的指针。 可以维护此功能,直到调用者尝试修改其资源的“副本”为止,此时将创建一个真正的私有副本,以防止其他人看到更改。 所有这一切对于调用者来说都是透明的。 主要优点是,如果调用者从不进行任何修改,则无需创建私有副本。

这里还有一个常见的 COW 应用程序:

COW 概念还用于维护 Microsoft SQL Server 2005 等数据库服务器上的即时快照。即时快照通过在底层数据更新时存储数据的预修改副本来保留数据库的静态视图。 即时快照用于测试用途或依赖于时刻的报告,不应用于替换备份。

I was going to write up my own explanation but this Wikipedia article pretty much sums it up.

Here is the basic concept:

Copy-on-write (sometimes referred to as "COW") is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This function can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently to the callers. The primary advantage is that if a caller never makes any modifications, no private copy need ever be created.

Also here is an application of a common use of COW:

The COW concept is also used in maintenance of instant snapshot on database servers like Microsoft SQL Server 2005. Instant snapshots preserve a static view of a database by storing a pre-modification copy of data when underlaying data are updated. Instant snapshots are used for testing uses or moment-dependent reports and should not be used to replace backups.

久隐师 2024-07-21 08:53:30

“写入时复制”或多或少意味着它听起来像这样:每个人都拥有相同数据的单个共享副本直到写入,然后创建一个副本。 通常,写时复制用于解决并发类问题。 例如,在 ZFS 中,磁盘上的数据块是按写时复制分配的; 只要没有改变,就保留原来的块; 更改仅更改了受影响的块。 这意味着分配的新块的最小数量。

这些更改通常也实现为事务,即它们具有ACID 属性。 这消除了一些并发问题,因为这样就可以保证所有更新都是原子的。

"Copy on write" means more or less what it sounds like: everyone has a single shared copy of the same data until it's written, and then a copy is made. Usually, copy-on-write is used to resolve concurrency sorts of problems. In ZFS, for example, data blocks on disk are allocated copy-on-write; as long as there are no changes, you keep the original blocks; a change changed only the affected blocks. This means the minimum number of new blocks are allocated.

These changes are also usually implemented to be transactional, ie, they have the ACID properties. This eliminates some concurrency issues, because then you're guaranteed that all updates are atomic.

温柔嚣张 2024-07-21 08:53:30

我不会在写时复制上重复相同的答案。 我认为安德鲁的回答查理的回答答案已经说得很清楚了。 我将给你举一个操作系统世界的例子,只是为了说明这个概念的使用有多广泛。

我们可以使用fork()vfork()来创建一个新进程。 vfork 遵循写时复制的概念。 例如,vfork创建的子进程会与父进程共享数据和代码段。 这加快了分叉时间。 如果您先执行 exec,然后执行 vfork,则预计会使用 vfork。 因此 vfork 将创建子进程,该子进程将与其父进程共享数据和代码段,但是当我们调用 exec 时,它将在子进程的地址空间中加载新可执行文件的映像。

I shall not repeat the same answer on Copy-on-Write. I think Andrew's answer and Charlie's answer have already made it very clear. I will give you an example from OS world, just to mention how widely this concept is used.

We can use fork() or vfork() to create a new process. vfork follows the concept of copy-on-write. For example, the child process created by vfork will share the data and code segment with the parent process. This speeds up the forking time. It is expected to use vfork if you are performing exec followed by vfork. So vfork will create the child process which will share data and code segment with its parent but when we call exec, it will load up the image of a new executable in the address space of the child process.

怀中猫帐中妖 2024-07-21 08:53:30

只是提供另一个示例, Mercurial使用写时复制使克隆本地存储库成为真正“廉价”的操作。

其原理与其他示例相同,只不过您讨论的是物理文件而不是内存中的对象。 最初,克隆并不是复制品,而是指向原始版本的硬链接。 当您更改克隆中的文件时,会写入副本以代表新版本。

Just to provide another example, Mercurial uses copy-on-write to make cloning local repositories a really "cheap" operation.

The principle is the same as the other examples, except that you're talking about physical files instead of objects in memory. Initially, a clone is not a duplicate but a hard link to the original. As you change files in the clone, copies are written to represent the new version.

仙女 2024-07-21 08:53:30

设计模式:可重用面向对象软件的元素一书埃里希·伽玛等人。 清楚地描述了写时复制优化(“后果”部分,“代理”一章):

代理模式在访问代理时引入了一定程度的间接性
目的。 额外的间接寻址有很多用途,具体取决于
代理类型:

  1. 远程代理可以隐藏对象驻留在不同地址空间的事实。
  2. 虚拟代理可以执行优化,例如按需创建对象。
  3. 保护代理和智能引用都允许在访问对象时执行额外的内务处理任务。

代理模式还有另一种优化可以隐藏
客户。 它被称为写时复制,它与创建有关
要求。 复制大型且复杂的对象可能会非常昂贵
手术。 如果副本从未被修改过,则无需修改
承担这笔费用。 通过使用代理来推迟复制过程,我们
确保我们只有在复制对象时才付出代价
已修改。

要使写时复制工作,必须对主题进行引用计数。
复制代理只会增加此引用
数数。 仅当客户端请求修改操作时
subject 代理实际上是否复制了它。 在这种情况下,代理必须
同时减少主题的引用计数。 当引用计数
变为零,主题被删除。

写时复制可以降低复制重量级主题的成本
显着。

下面是使用 代理模式 进行写时复制优化的 Python 实现。 此设计模式的目的是为另一个对象提供代理来控制对其的访问。

代理模式的类图:

代理模式的类图

代理模式的对象图:

代理模式的对象图

首先我们定义主体的接口:

import abc


class Subject(abc.ABC):

    @abc.abstractmethod
    def clone(self):
        raise NotImplementedError

    @abc.abstractmethod
    def read(self):
        raise NotImplementedError

    @abc.abstractmethod
    def write(self, data):
        raise NotImplementedError

接下来我们定义真正的主体实现主题接口:

import copy


class RealSubject(Subject):

    def __init__(self, data):
        self.data = data

    def clone(self):
        return copy.deepcopy(self)

    def read(self):
        return self.data

    def write(self, data):
        self.data = data

最后,我们定义实现主题接口并引用真实主题的代理:

class Proxy(Subject):

    def __init__(self, subject):
        self.subject = subject
        try:
            self.subject.counter += 1
        except AttributeError:
            self.subject.counter = 1

    def clone(self):
        return Proxy(self.subject)  # attribute sharing (shallow copy)

    def read(self):
        return self.subject.read()

    def write(self, data):
        if self.subject.counter > 1:
            self.subject.counter -= 1
            self.subject = self.subject.clone() # attribute copying (deep copy)
            self.subject.counter = 1
        self.subject.write(data)

然后,客户端可以通过使用代理作为真实主题的替代品,从写时复制优化中受益:

if __name__ == '__main__':
    x = Proxy(RealSubject('foo'))
    x.write('bar')
    y = x.clone()  # the real subject is shared instead of being copied
    print(x.read(), y.read())  # bar bar
    assert x.subject is y.subject
    x.write('baz')  # the real subject is copied on write because it was shared
    print(x.read(), y.read())  # baz bar
    assert x.subject is not y.subject

The book Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma et al. clearly describes the copy-on-write optimization (section ‘Consequences’, chapter ‘Proxy’):

The Proxy pattern introduces a level of indirection when accessing an
object. The additional indirection has many uses, depending on the
kind of proxy:

  1. A remote proxy can hide the fact that an object resides in a different address space.
  2. A virtual proxy can perform optimizations such as creating an object on demand.
  3. Both protection proxies and smart references allow additional housekeeping tasks when an object is accessed.

There’s another optimization that the Proxy pattern can hide from the
client. It’s called copy-on-write, and it’s related to creation on
demand. Copying a large and complicated object can be an expensive
operation. If the copy is never modified, then there’s no need to
incur this cost. By using a proxy to postpone the copying process, we
ensure that we pay the price of copying the object only if it’s
modified.

To make copy-on-write work, the subject must be referenced counted.
Copying the proxy will do nothing more than increment this reference
count. Only when the client requests an operation that modifies the
subject does the proxy actually copy it. In that case the proxy must
also decrement the subject’s reference count. When the reference count
goes to zero, the subject gets deleted.

Copy-on-write can reduce the cost of copying heavyweight subjects
significantly.

Here after is a Python implementation of the copy-on-write optimization using the Proxy pattern. The intent of this design pattern is to provide a surrogate for another object to control access to it.

Class diagram of the Proxy pattern:

Class diagram of the Proxy pattern

Object diagram of the Proxy pattern:

Object diagram of the Proxy pattern

First we define the interface of the subject:

import abc


class Subject(abc.ABC):

    @abc.abstractmethod
    def clone(self):
        raise NotImplementedError

    @abc.abstractmethod
    def read(self):
        raise NotImplementedError

    @abc.abstractmethod
    def write(self, data):
        raise NotImplementedError

Next we define the real subject implementing the subject interface:

import copy


class RealSubject(Subject):

    def __init__(self, data):
        self.data = data

    def clone(self):
        return copy.deepcopy(self)

    def read(self):
        return self.data

    def write(self, data):
        self.data = data

Finally we define the proxy implementing the subject interface and referencing the real subject:

class Proxy(Subject):

    def __init__(self, subject):
        self.subject = subject
        try:
            self.subject.counter += 1
        except AttributeError:
            self.subject.counter = 1

    def clone(self):
        return Proxy(self.subject)  # attribute sharing (shallow copy)

    def read(self):
        return self.subject.read()

    def write(self, data):
        if self.subject.counter > 1:
            self.subject.counter -= 1
            self.subject = self.subject.clone() # attribute copying (deep copy)
            self.subject.counter = 1
        self.subject.write(data)

The client can then benefit from the copy-on-write optimization by using the proxy as a stand-in for the real subject:

if __name__ == '__main__':
    x = Proxy(RealSubject('foo'))
    x.write('bar')
    y = x.clone()  # the real subject is shared instead of being copied
    print(x.read(), y.read())  # bar bar
    assert x.subject is y.subject
    x.write('baz')  # the real subject is copied on write because it was shared
    print(x.read(), y.read())  # baz bar
    assert x.subject is not y.subject
时光匆匆的小流年 2024-07-21 08:53:30

我发现这篇关于PHP中zval的好文章,其中提到了COW也:

写时复制(缩写为“COW”)是一种旨在节省内存的技巧。 它在软件工程中使用得更普遍。 这意味着当您写入符号时,如果该符号已经指向 zval,PHP 将复制内存(或分配新的内存区域)。

I found this good article about zval in PHP, which mentioned COW too:

Copy On Write (abbreviated as ‘COW’) is a trick designed to save memory. It is used more generally in software engineering. It means that PHP will copy the memory (or allocate new memory region) when you write to a symbol, if this one was already pointing to a zval.

断肠人 2024-07-21 08:53:30

Git 就是一个很好的例子,它使用一种策略来存储 blob。 为什么它使用哈希值? 部分原因是这些更容易执行差异,而且还因为可以更简单地优化 COW 策略。 当您进行新的提交并更改少量文件时,绝大多数对象和树都不会更改。 因此提交时,会通过哈希值组成的各种指针引用一堆已经存在的对象,使得存储整个历史记录所需的存储空间小得多。

A good example is Git, which uses a strategy to store blobs. Why does it use hashes? Partly because these are easier to perform diffs on, but also because makes it simpler to optimise a COW strategy. When you make a new commit with few files changes the vast majority of objects and trees will not change. Therefore the commit, will through various pointers made of hashes reference a bunch of object that already exist, making the storage space required to store the entire history much smaller.

别念他 2024-07-21 08:53:30

这是一个内存保护的概念。 在此编译器中创建额外的副本来修改子级中的数据,并且此更新的数据不会反映在父级数据中。

It is a memory protection concept. In this compiler creates extra copy to modify data in child and this updated data not reflect in parents data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文