当前位置：文江博客话题详情

如何实现写时复制？

发布于 2024-08-09 08:09:41 字数 73 浏览 9 评论 0原文

我想在我的自定义 C++ String 类上实现写时复制，但我不知道如何实现。

我尝试实施一些选项，但结果都非常低效。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

↘紸啶 2024-08-16 08:09:41

在多线程环境（现在是大多数环境）中，CoW 通常会带来巨大的性能损失，而不是带来收益。仔细使用 const 引用，即使在单线程环境中也不会带来太大的性能提升。

这篇旧的 DDJ 文章解释了CoW 在多线程环境中有多糟糕，即使只有一个线程。

此外，正如其他人指出的那样，CoW 字符串的实现非常棘手，而且很容易出错。再加上它们在线程情况下表现不佳，让我真的质疑它们的总体用途。一旦您开始使用 C++11 移动构造和移动赋值，这一点就变得更加正确。

但是，回答你的问题......

这里有一些可能有助于提高性能的实现技术。

首先，将长度存储在字符串本身中。长度的访问非常频繁，消除指针取消引用可能会有所帮助。为了保持一致性，我也会将分配的长度也放在那里。这会让你的字符串对象变得更大一些，但是空间和复制时间的开销非常小，特别是因为这些值将变得更容易让编译器发挥有趣的优化技巧。

这给您留下了一个如下所示的字符串类：

class MyString {
   ...
 private:
   class Buf {
      ...
    private:
      ::std::size_t refct_;
      char *data_;
   };

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

现在，您可以执行进一步的优化。那里的 Buf 类看起来并没有真正包含或做太多事情，这是事实。此外，它还需要分配一个 Buf 实例和一个缓冲区来保存字符。这看起来相当浪费。因此，我们将转向一种常见的 C 实现技术，即弹性缓冲区：

class MyString {
   ...
 private:
   struct Buf {
      ::std::size_t refct_;
      char data_[1];
   };

   void resizeBufTo(::std::size_t newsize);
   void dereferenceBuf();

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

void MyString::resizeBufTo(::std::size_t newsize)
{
   assert((data_ == 0) || (data_->refct_ == 1));
   if (newsize != 0) {
      // Yes, I'm using C's allocation functions on purpose.
      // C++'s new is a poor match for stretchy buffers.
      Buf *newbuf = ::std::realloc(data_, sizeof(*newbuf) + (newsize - 1));
      if (newbuf == 0) {
         throw ::std::bad_alloc();
      } else {
         data_ = newbuf_;
      }
   } else { // newsize is 0
      if (data_ != 0) {
         ::std::free(data_);
         data_ = 0;
      }
   }
   alloclen_ = newsize;
}

当您以这种方式执行操作时，您可以将 data_->data_ 视为包含 alloclen_ 字节而不仅仅是 1。

请记住，在所有这些情况下，您必须确保永远不会在多线程环境中使用它，或者确保 refct_是一种既具有原子增量又具有原子减量和测试指令的类型。

有一种更高级的优化技术，涉及使用联合将短字符串存储在用于描述较长字符串的数据位内。但这甚至更复杂，我认为我不会愿意编辑它以在稍后放一个简化的示例，但你永远无法判断。

In a multi-threaded environemnt (which is most of them nowadays) CoW is frequently a huge performance hit rather than a gain. And with careful use of const references, it's not much of a performance gain even in a single threaded environment.

This old DDJ article explains just how bad CoW can be in a multithreaded environment, even if there's only one thread.

Additionally, as other people have pointed out, CoW strings are really tricky to implement, and it's easy to make mistakes. That coupled with their poor performance in threading situations makes me really question their usefulness in general. This becomes even more true once you start using C++11 move construction and move assignment.

But, to answer your question....

Here are a couple of implementation techniques that may help with performance.

First, store the length in the string itself. The length is accessed quite frequently and eliminating the pointer dereference would probably help. I would, just for consistency put the allocated length there too. This will cost you in terms of your string objects being a bit bigger, but the overhead there in space and copying time is very small, especially since these values will then become easier for the compiler to play interesting optimization tricks with.

This leaves you with a string class that looks like this:

class MyString {
   ...
 private:
   class Buf {
      ...
    private:
      ::std::size_t refct_;
      char *data_;
   };

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

Now, there are further optimizations you can perform. The Buf class there looks like it doesn't really contain or do much, and this is true. Additionally, it requires allocating both an instance of Buf and a buffer to hold the characters. This seems rather wasteful. So, we'll turn to a common C implementation technique, stretchy buffers:

class MyString {
   ...
 private:
   struct Buf {
      ::std::size_t refct_;
      char data_[1];
   };

   void resizeBufTo(::std::size_t newsize);
   void dereferenceBuf();

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

void MyString::resizeBufTo(::std::size_t newsize)
{
   assert((data_ == 0) || (data_->refct_ == 1));
   if (newsize != 0) {
      // Yes, I'm using C's allocation functions on purpose.
      // C++'s new is a poor match for stretchy buffers.
      Buf *newbuf = ::std::realloc(data_, sizeof(*newbuf) + (newsize - 1));
      if (newbuf == 0) {
         throw ::std::bad_alloc();
      } else {
         data_ = newbuf_;
      }
   } else { // newsize is 0
      if (data_ != 0) {
         ::std::free(data_);
         data_ = 0;
      }
   }
   alloclen_ = newsize;
}

When you do things this way, you can then treat data_->data_ as if it contained alloclen_ bytes instead of just 1.

Keep in mind that in all of these cases you will have to make sure that you either never ever use this in a multi-threaded environment, or that you make sure that refct_ is a type that you have both an atomic increment, and an atomic decrement and test instruction for.

There is an even more advanced optimization technique that involves using a union to store short strings right inside the bits of data that you would use to describe a longer string. But that's even more complex, and I don't think I will feel inclined to edit this to put a simplified example here later, but you never can tell.

回复收藏 0 原文

药祭#氼 2024-08-16 08:09:41

我建议，如果想要有效地实现写时复制（对于字符串或其他任何东西），应该定义一个包装类型，它将表现为可变字符串，并且它将保存对可变字符串的可为空引用（不对该项目的其他引用将永远存在）以及对“不可变”字符串的可为空引用（对它的引用永远不会存在于不会尝试改变它的事物之外）。包装器将始终在创建时至少使用其中一个非空引用；一旦可变项引用被设置为非空值（在构造期间或之后），它将永远引用相同的目标。任何时候两个引用都是非空的，不可变项引用将指向最近完成的突变后一段时间创建的项目的副本（在突变期间，不可变项引用可能会也可能不会持有引用到突变前的值）。

要读取对象，请检查“mutable-item”引用是否为非空。如果是这样，请使用它。否则，检查“immutable-item”引用是否为非空。如果是这样，请使用它。否则，使用“可变项”引用（现在将是非空的）。

要改变对象，请检查“mutable-item”引用是否为非空。如果不是，则复制“不可变项”引用的目标，并将对新对象的引用 CompareExchange 到“可变项”引用中。然后改变“可变项”引用的目标并使“不可变项”引用无效。

要克隆一个对象，如果预期在突变之前再次克隆该克隆，请检索“immutable-item”引用的值。如果为空，则复制“可变项”目标，并将对该新对象的引用CompareExchange到不可变项引用中。然后创建一个新的包装器，其“mutable-item”引用为 null，其“immutable-item”引用是检索到的值（如果不为 null）或新项目（如果为 null）。

要克隆一个对象，如果预期克隆在克隆之前会发生变化，请检索“immutable-item”引用的值。如果为 null，则检索“mutable-item”引用。复制检索到的引用的目标，并创建一个新的包装器，其“可变项”引用指向新副本，并且其“不可变项”引用为空。

这两种克隆方法在语义上是相同的，但针对给定情况选择错误的方法将导致额外的复制操作。如果始终选择正确的复制操作，则将获得“积极的”写时复制方法的大部分好处，但线程开销要少得多。每个数据保存对象（例如字符串）要么是非共享可变的，要么是共享不可变的，并且没有对象会在这些状态之间切换。因此，如果需要，可以消除所有“线程/同步开销”（用直接存储替换 CompareExchange 操作），前提是没有包装器对象同时在多个线程中使用。两个包装器对象可能保存对同一不可变数据持有者的引用，但它们可能不知道彼此的存在。

请注意，使用此方法时可能比使用“激进”方法时需要更多的复制操作。例如，如果使用新字符串创建一个新包装器，并且该包装器发生变异并复制六次，则原始包装器将保存对原始字符串持有者的引用以及持有数据副本的不可变者的引用。六个复制的包装器将仅保存对不可变字符串的引用（总共两个字符串，尽管如果原始字符串在复制后从未发生变化，则可以使用一个激进的实现）。如果原始包装器以及六个副本中的五个发生了变异，那么除了一个不可变字符串的引用之外的所有引用都将变得无效。此时，如果第六个包装器副本发生变异，激进的写时复制实现可能会意识到它保留了对其字符串的唯一引用，从而决定副本是不必要的。然而，我描述的实现将创建一个新的可变副本并放弃不可变副本。然而，尽管存在一些额外的复制操作，但在大多数情况下，线程开销的减少应该足以抵消成本。如果生成的大多数逻辑副本从未发生变化，则此方法可能比始终制作字符串副本更有效。

I would suggest that if one wants to implement copy-on-write efficiently (for strings or whatever), one should define a wrapper type which will behave as a mutable string, and which will hold both a nullable reference to a mutable string (no other reference to that item will ever exist) and a nullable reference to an "immutable" string (references to which will never exist outside things that won't try to mutate it). Wrappers will always be created with at least one of those references non-null; once the mutable-item reference is ever set to a non-null value (during or after construction) it will forever refer to the same target. Any time both references are non-null, the immutable-item reference will point to a copy of the item that was made some time after the most recent completed mutation (during a mutation, the immutable-item reference may or may not hold a reference to a pre-mutation value).

To read an object, check whether the "mutable-item" reference is non-null. If so, use it. Otherwise, check whether the "immutable-item" reference is non-null. If so, use it. Otherwise, use the "mutable item" reference (which by now will be non-null).

To mutate an object, check whether the "mutable-item" reference is non-null. If not, copy the target of the "immutable item" reference and CompareExchange a reference to the new object into the "mutable item" reference. Then mutate the target of the "mutable item" reference and invalidate the "immutable item" reference.

To clone an object, if the clone is expected to be cloned again before it is mutated, retrieve the value of the "immutable-item" reference. If it is null, make a copy of the "mutable item" target and CompareExchange a reference to that new object into the immutable-item reference. Then create a new wrapper whose "mutable-item" reference is null, and whose "immutable-item" reference is either the retrieved value (if it wasn't null) or the new item (if it was).

To clone an object, if the clone is expected to be mutated before it is cloned, retrieve the value of the "immutable-item" reference. If null, retrieve the "mutable-item" reference. Copy the target of whichever reference was retrieved and create a new wrapper whose "mutable-item" reference points to the new copy, and whose "immutable-item" reference is null.

The two cloning methods will be semantically identical, but picking the wrong one for a given situation will result in an extra copy operation. If one consistently chooses the correct copy operation, one will get most of the benefit of an "aggressive" copy-on-write approach, but with far less threading overhead. Every data holding object (e.g. string) will either be unshared mutable or shared immutable, and no object will ever switch between those states. Consequently, one could if desired eliminate all "threading/synchronization overhead" (replacing the CompareExchange operations with straight stores) provided that no wrapper object is used in more than one thread simultaneously. Two wrapper objects might hold references to the same immutable data holder, but they could be oblivious to each others' existence.

Note that a few more copy operations may be required when using this approach than when using an "aggressive" approach. For example, if a new wrapper is created with a new string, and that wrapper is mutated, and copied six times, the original wrapper would hold references to the original string holder and an immutable one holding a copy of the data. The six copied wrappers would just hold a reference to the immutable string (two strings total, although if the original string were never mutated after the copy was made, an aggressive implementation could get by with one). If the original wrapper were mutated, along with five of the six copies, then all but one of the references to the immutable string would get invalidated. At that point, if the sixth wrapper copy were mutated, an aggressive copy-on-write implementation might realize that it held the only reference to its string, and thus decide a copy was unnecessary. The implementation I describe, however, would create a new mutable copy and abandon the immutable one. Despite the fact that there are some extra copy operations, however, the reduction in threading overhead should in most cases more than offset the cost. If the majority of logical copies that are produced are never mutated, this approach may be more efficient than always making copies of strings.

回复收藏 0 原文

夏花。依旧 2024-08-16 08:09:41

CoW 没有太多内容。基本上，当您想要更改它时进行复制，并让任何不想更改它的人保留对旧实例的引用。您需要引用计数来跟踪谁仍在引用该对象，并且由于您正在创建新副本，因此需要减少“旧”实例的计数。一个捷径是当该计数为一时不进行复制（您是唯一的参考）。

除此之外，没有什么可说的，除非您面临特定的问题。

回复收藏 0 原文

猫腻 2024-08-16 08:09:41

您可能想模拟其他语言具有的“不可变”字符串（据我所知，Python、C#）。

这个想法是每个字符串都是不可变的，因此对字符串的任何工作都会创建一个新的不可变字符串......或者这是基本思想，为了避免爆炸，如果有类似的字符串，则不需要创建另一个字符串。

回复收藏 0 原文

扮仙女 2024-08-16 08:09:41

template <class T> struct cow {
  typedef boost::shared_ptr<T> ptr_t;
  ptr_t _data;

  ptr_t get()
  {
    return boost::atomic_load(&_data);
  }

  void set(ptr_t const& data)
  {
    boost::atomic_store(&_data, data);
  }
}

template <class T> struct cow {
  typedef boost::shared_ptr<T> ptr_t;
  ptr_t _data;

  ptr_t get()
  {
    return boost::atomic_load(&_data);
  }

  void set(ptr_t const& data)
  {
    boost::atomic_store(&_data, data);
  }
}

回复收藏 0 原文

~没有更多了~