当前位置：文江博客话题详情

什么是写时复制？

发布于 2024-07-14 08:53:30 字数 55 浏览 10 评论 0原文

我想知道写时复制是什么以及它的用途。 Sun JDK 教程中多次提到该术语。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

帝王念 2024-07-21 08:53:31

它也在 Ruby“企业版”中用作节省内存的巧妙方法。

回复收藏 0 原文

夏末的微笑 2024-07-21 08:53:30

我本来打算写下我自己的解释，但这篇维基百科文章几乎总结了它向上。

这是基本概念：

写入时复制（有时称为“COW”）是计算机编程中使用的一种优化策略。基本思想是，如果多个调用者请求最初无法区分的资源，您可以为他们提供指向同一资源的指针。可以维护此功能，直到调用者尝试修改其资源的“副本”为止，此时将创建一个真正的私有副本，以防止其他人看到更改。所有这一切对于调用者来说都是透明的。主要优点是，如果调用者从不进行任何修改，则无需创建私有副本。

这里还有一个常见的 COW 应用程序：

COW 概念还用于维护 Microsoft SQL Server 2005 等数据库服务器上的即时快照。即时快照通过在底层数据更新时存储数据的预修改副本来保留数据库的静态视图。即时快照用于测试用途或依赖于时刻的报告，不应用于替换备份。

回复收藏 0 原文

久隐师 2024-07-21 08:53:30

“写入时复制”或多或少意味着它听起来像这样：每个人都拥有相同数据的单个共享副本直到写入，然后创建一个副本。通常，写时复制用于解决并发类问题。例如，在 ZFS 中，磁盘上的数据块是按写时复制分配的；只要没有改变，就保留原来的块；更改仅更改了受影响的块。这意味着分配的新块的最小数量。

这些更改通常也实现为事务，即它们具有ACID 属性。这消除了一些并发问题，因为这样就可以保证所有更新都是原子的。

回复收藏 0 原文

温柔嚣张 2024-07-21 08:53:30

我不会在写时复制上重复相同的答案。我认为安德鲁的回答和查理的回答答案已经说得很清楚了。我将给你举一个操作系统世界的例子，只是为了说明这个概念的使用有多广泛。

我们可以使用fork()或vfork()来创建一个新进程。 vfork 遵循写时复制的概念。例如，vfork创建的子进程会与父进程共享数据和代码段。这加快了分叉时间。如果您先执行 exec，然后执行 vfork，则预计会使用 vfork。因此 vfork 将创建子进程，该子进程将与其父进程共享数据和代码段，但是当我们调用 exec 时，它将在子进程的地址空间中加载新可执行文件的映像。

回复收藏 0 原文

怀中猫帐中妖 2024-07-21 08:53:30

只是提供另一个示例， Mercurial使用写时复制使克隆本地存储库成为真正“廉价”的操作。

其原理与其他示例相同，只不过您讨论的是物理文件而不是内存中的对象。最初，克隆并不是复制品，而是指向原始版本的硬链接。当您更改克隆中的文件时，会写入副本以代表新版本。

回复收藏 0 原文

仙女 2024-07-21 08:53:30

设计模式：可重用面向对象软件的元素一书埃里希·伽玛等人。清楚地描述了写时复制优化（“后果”部分，“代理”一章）：

代理模式在访问代理时引入了一定程度的间接性
目的。额外的间接寻址有很多用途，具体取决于
代理类型：
远程代理可以隐藏对象驻留在不同地址空间的事实。
虚拟代理可以执行优化，例如按需创建对象。
保护代理和智能引用都允许在访问对象时执行额外的内务处理任务。
代理模式还有另一种优化可以隐藏
客户。它被称为写时复制，它与创建有关
要求。复制大型且复杂的对象可能会非常昂贵
手术。如果副本从未被修改过，则无需修改
承担这笔费用。通过使用代理来推迟复制过程，我们
确保我们只有在复制对象时才付出代价
已修改。
要使写时复制工作，必须对主题进行引用计数。
复制代理只会增加此引用
数数。仅当客户端请求修改操作时
subject 代理实际上是否复制了它。在这种情况下，代理必须
同时减少主题的引用计数。当引用计数
变为零，主题被删除。
写时复制可以降低复制重量级主题的成本
显着。

下面是使用代理模式进行写时复制优化的 Python 实现。此设计模式的目的是为另一个对象提供代理来控制对其的访问。

代理模式的类图：

代理模式的对象图：

首先我们定义主体的接口：

import abc


class Subject(abc.ABC):

    @abc.abstractmethod
    def clone(self):
        raise NotImplementedError

    @abc.abstractmethod
    def read(self):
        raise NotImplementedError

    @abc.abstractmethod
    def write(self, data):
        raise NotImplementedError

接下来我们定义真正的主体实现主题接口：

import copy


class RealSubject(Subject):

    def __init__(self, data):
        self.data = data

    def clone(self):
        return copy.deepcopy(self)

    def read(self):
        return self.data

    def write(self, data):
        self.data = data

最后，我们定义实现主题接口并引用真实主题的代理：

class Proxy(Subject):

    def __init__(self, subject):
        self.subject = subject
        try:
            self.subject.counter += 1
        except AttributeError:
            self.subject.counter = 1

    def clone(self):
        return Proxy(self.subject)  # attribute sharing (shallow copy)

    def read(self):
        return self.subject.read()

    def write(self, data):
        if self.subject.counter > 1:
            self.subject.counter -= 1
            self.subject = self.subject.clone() # attribute copying (deep copy)
            self.subject.counter = 1
        self.subject.write(data)

然后，客户端可以通过使用代理作为真实主题的替代品，从写时复制优化中受益：

if __name__ == '__main__':
    x = Proxy(RealSubject('foo'))
    x.write('bar')
    y = x.clone()  # the real subject is shared instead of being copied
    print(x.read(), y.read())  # bar bar
    assert x.subject is y.subject
    x.write('baz')  # the real subject is copied on write because it was shared
    print(x.read(), y.read())  # baz bar
    assert x.subject is not y.subject

The book Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma et al. clearly describes the copy-on-write optimization (section ‘Consequences’, chapter ‘Proxy’):

The Proxy pattern introduces a level of indirection when accessing an
object. The additional indirection has many uses, depending on the
kind of proxy:
A remote proxy can hide the fact that an object resides in a different address space.
A virtual proxy can perform optimizations such as creating an object on demand.
Both protection proxies and smart references allow additional housekeeping tasks when an object is accessed.
There’s another optimization that the Proxy pattern can hide from the
client. It’s called copy-on-write, and it’s related to creation on
demand. Copying a large and complicated object can be an expensive
operation. If the copy is never modified, then there’s no need to
incur this cost. By using a proxy to postpone the copying process, we
ensure that we pay the price of copying the object only if it’s
modified.
To make copy-on-write work, the subject must be referenced counted.
Copying the proxy will do nothing more than increment this reference
count. Only when the client requests an operation that modifies the
subject does the proxy actually copy it. In that case the proxy must
also decrement the subject’s reference count. When the reference count
goes to zero, the subject gets deleted.
Copy-on-write can reduce the cost of copying heavyweight subjects
significantly.

Here after is a Python implementation of the copy-on-write optimization using the Proxy pattern. The intent of this design pattern is to provide a surrogate for another object to control access to it.

Class diagram of the Proxy pattern:

Object diagram of the Proxy pattern:

First we define the interface of the subject:

import abc


class Subject(abc.ABC):

    @abc.abstractmethod
    def clone(self):
        raise NotImplementedError

    @abc.abstractmethod
    def read(self):
        raise NotImplementedError

    @abc.abstractmethod
    def write(self, data):
        raise NotImplementedError

Next we define the real subject implementing the subject interface:

import copy


class RealSubject(Subject):

    def __init__(self, data):
        self.data = data

    def clone(self):
        return copy.deepcopy(self)

    def read(self):
        return self.data

    def write(self, data):
        self.data = data

Finally we define the proxy implementing the subject interface and referencing the real subject:

class Proxy(Subject):

    def __init__(self, subject):
        self.subject = subject
        try:
            self.subject.counter += 1
        except AttributeError:
            self.subject.counter = 1

    def clone(self):
        return Proxy(self.subject)  # attribute sharing (shallow copy)

    def read(self):
        return self.subject.read()

    def write(self, data):
        if self.subject.counter > 1:
            self.subject.counter -= 1
            self.subject = self.subject.clone() # attribute copying (deep copy)
            self.subject.counter = 1
        self.subject.write(data)

The client can then benefit from the copy-on-write optimization by using the proxy as a stand-in for the real subject:

if __name__ == '__main__':
    x = Proxy(RealSubject('foo'))
    x.write('bar')
    y = x.clone()  # the real subject is shared instead of being copied
    print(x.read(), y.read())  # bar bar
    assert x.subject is y.subject
    x.write('baz')  # the real subject is copied on write because it was shared
    print(x.read(), y.read())  # baz bar
    assert x.subject is not y.subject

回复收藏 0 原文

时光匆匆的小流年 2024-07-21 08:53:30

我发现这篇关于PHP中zval的好文章，其中提到了COW也：

写时复制（缩写为“COW”）是一种旨在节省内存的技巧。它在软件工程中使用得更普遍。这意味着当您写入符号时，如果该符号已经指向 zval，PHP 将复制内存（或分配新的内存区域）。

回复收藏 0 原文

断肠人 2024-07-21 08:53:30

Git 就是一个很好的例子，它使用一种策略来存储 blob。为什么它使用哈希值？部分原因是这些更容易执行差异，而且还因为可以更简单地优化 COW 策略。当您进行新的提交并更改少量文件时，绝大多数对象和树都不会更改。因此提交时，会通过哈希值组成的各种指针引用一堆已经存在的对象，使得存储整个历史记录所需的存储空间小得多。

回复收藏 0 原文