我是否需要将类型设为 POD 才能将其与内存映射文件一起保存？

发布于 2024-12-03 04:23:48 字数 1486 浏览 2 评论 0原文

指针不能直接持久化到文件，因为它们指向绝对地址。为了解决这个问题，我编写了一个 relative_ptr 模板，它保存偏移量而不是绝对地址。

基于只有简单可复制类型才能安全地逐位复制这一事实，我假设该类型需要简单可复制才能安全地保留在内存映射文件中并稍后检索。

这个限制结果有点问题，因为编译器生成的复制构造函数没有以有意义的方式运行。我发现没有任何东西禁止我默认复制构造函数并将其设为私有，因此我将其设为私有以避免意外复制导致未定义的行为。

后来，我发现 boost::interprocess::offset_ptr 其创作是由相同的需求驱动的。然而，事实证明，offset_ptr 并不是简单可复制的，因为它实现了自己的自定义复制构造函数。

我认为智能指针需要可简单复制才能安全地持久存在的假设是错误的吗？

如果没有这样的限制，我想知道我是否也可以安全地执行以下操作。如果不是，那么类型必须满足哪些要求才能在我上面描述的场景中使用？

struct base {
    int x;
    virtual void f() = 0;
    virtual ~base() {} // virtual members!
};

struct derived : virtual base {
    int x;
    void f() { std::cout << x; }
};

using namespace boost::interprocess;

void persist() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    // create object on a memory-mapped file
    derived* d = new (region.get_address()) derived();
    d.x = 42;
    d->f();
    region.flush();
}

void retrieve() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    derived* d = region.get_address();
    d->f();
}

int main() {
    persist();
    retrieve();
}

感谢所有提供替代方案的人。我不太可能很快就会使用其他东西，因为正如我所解释的，我已经有了一个可行的解决方案。正如你从上面问号的使用中看到的，我真的很想知道为什么 Boost 可以在没有简单可复制类型的情况下逃脱，以及你能走多远：很明显，具有虚拟成员的类不会工作，但是你的界限在哪里？

原文

Pointers cannot be persisted directly to file, because they point to absolute addresses. To address this issue I wrote a relative_ptr template that holds an offset instead of an absolute address.

Based on the fact that only trivially copyable types can be safely copied bit-by-bit, I made the assumption that this type needed to be trivially copyable to be safely persisted in a memory-mapped file and retrieved later on.

This restriction turned out to be a bit problematic, because the compiler generated copy constructor does not behave in a meaningful way. I found nothing that forbid me from defaulting the copy constructor and making it private, so I made it private to avoid accidental copies that would lead to undefined behaviour.

Later on, I found boost::interprocess::offset_ptr whose creation was driven by the same needs. However, it turns out that offset_ptr is not trivially copyable because it implements its own custom copy constructor.

Is my assumption that the smart pointer needs to be trivially copyable to be persisted safely wrong?

If there's no such restriction, I wonder if I can safely do the following as well. If not, exactly what are the requirements a type must fulfill to be usable in the scenario I described above?

struct base {
    int x;
    virtual void f() = 0;
    virtual ~base() {} // virtual members!
};

struct derived : virtual base {
    int x;
    void f() { std::cout << x; }
};

using namespace boost::interprocess;

void persist() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    // create object on a memory-mapped file
    derived* d = new (region.get_address()) derived();
    d.x = 42;
    d->f();
    region.flush();
}

void retrieve() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    derived* d = region.get_address();
    d->f();
}

int main() {
    persist();
    retrieve();
}

Thanks to all those that provided alternatives. It's unlikely that I will be using something else any time soon, because as I explained, I already have a working solution. And as you can see from the use of question marks above, I'm really interested in knowing why Boost can get away without a trivially copyable type, and how far can you go with it: it's quite obvious that classes with virtual members will not work, but where do you draw the line?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅忆流年 2024-12-10 04:23:48

为了避免混淆，让我重申一下这个问题。

您希望在映射内存中创建一个对象，以便在应用程序关闭并重新打开后，可以再次映射文件并使用对象而无需进一步反序列化。

POD 是一种转移你想要做的事情的注意力。您不需要是二进制可复制的（POD 的含义）；你需要独立于地址。

地址无关性要求您：

避免所有绝对指针。
仅使用指向映射内存中地址的偏移指针。

这些规则可以得出一些相关性。

您不能使用虚拟任何东西。 C++ 虚函数是通过类实例中隐藏的 vtable 指针实现的。 vtable 指针是一个绝对指针，您无法对其进行任何控制。
您需要非常小心您的地址无关对象使用的其他 C++ 对象。基本上，如果您使用标准库中的所有内容都可能会损坏。即使它们不使用 new，它们也可能在内部使用虚函数，或者只存储指针的地址。
您不能将引用存储在与地址无关的对象中。引用成员只是绝对指针的语法糖。

继承仍然是可能的，但用途有限，因为虚拟是非法的。

只要遵循上述规则，任何和所有构造函数/析构函数都可以。

即使 Boost.Interprocess 也并不完全适合您想要做的事情。 Boost.Interprocess 还需要管理对对象的共享访问，而您可以假设您只是一个搞乱内存的人。

最后，使用 Google Protobufs 和传统序列化可能会更简单/更明智。

回复收藏 0 原文

陌路黄昏 2024-12-10 04:23:48

是的，但出于与您无关的原因。

您已经有了虚拟函数和虚拟基类。这些会导致编译器在您背后创建大量指针。你不能将它们变成偏移量或其他任何东西。

如果你想做这种持久化方式，你需要避开“虚拟”。之后，这都是语义问题。真的，假装你是在 C 语言中这样做的。

回复收藏 0 原文

旧情别恋 2024-12-10 04:23:48

如果您有兴趣跨不同系统或跨时间进行互操作，即使 PoD 也存在陷阱。

您可以查看Google Protocol Buffers，寻找一种以便携式方式执行此操作的方法。

回复收藏 0 原文

不美如何 2024-12-10 04:23:48

与其说是一个答案，不如说是一个太大的评论：

我认为这将取决于您愿意为了速度/易用性而牺牲多少安全性。如果您有这样的 struct ：

struct S { char c; double d; };

您必须考虑填充，并且某些架构可能不允许您访问 double ，除非它在 a 上对齐正确的内存地址。添加访问器函数并修复填充可以解决这个问题，并且该结构仍然可以使用memcpy，但现在我们正在进入一个领域，我们并没有真正从使用内存映射文件中获得太多好处。

由于您似乎只会在本地和固定设置中使用它，因此稍微放松一下要求似乎没问题，因此我们回到正常使用上面的 struct 。现在函数必须是可简单复制的吗？我不一定这么认为，请考虑这个（可能已损坏）类：

   1 #include <iostream>
   2 #include <utility>
   3 
   4 enum Endian { LittleEndian, BigEndian };
   5 template<typename T, Endian e> struct PV {
   6         union {
   7                 unsigned char b[sizeof(T)];
   8                 T x;
   9         } val;  
  10         
  11         template<Endian oe> PV& operator=(const PV<T,oe>& rhs) {
  12                 val.x = rhs.val.x;
  13                 if (e != oe) {
  14                         for(size_t b = 0; b < sizeof(T) / 2; b++) {
  15                                 std::swap(val.b[sizeof(T)-1-b], val.b[b]);
  16                         }       
  17                 }       
  18                 return *this;
  19         }       
  20 };

它不是可简单复制的，并且您不能仅使用 memcpy 来移动它，但我没有立即看到任何内容在内存映射文件的上下文中使用这样的类是错误的（特别是如果文件与本机字节顺序匹配）。

更新：
你在哪里划清界限？

我认为一个不错的经验法则是：如果等效的 C 代码是可以接受的，并且 C++ 只是为了方便而使用，以强制类型安全或正确访问它应该没问题。

这将使 boost::interprocess::offset_ptr 没问题，因为它只是具有特殊语义规则的 ptrdiff_t 的有用包装器。同样，上面的 struct PV 也可以，因为它只是自动进行字节交换，尽管像在 C 中一样，您必须小心跟踪字节顺序并假设该结构可以是平凡的复制的。虚函数不行，因为结构中的 C 等效函数指针不起作用。然而，类似以下（未经测试）的代码也可以：

struct Foo { 
    unsigned char obj_type;
    void vfunc1(int arg0) { vtables[obj_type].vfunc1(this, arg0); }
};

Not as much an answer as a comment that grew too big:

I think it's going to depend on how much safety you're willing to trade for speed/ease of usage. In the case where you have a struct like this:

struct S { char c; double d; };

You have to consider padding and the fact that some architectures might not allow you to access a double unless it is aligned on a proper memory address. Adding accessor functions and fixing the padding tackles this and the structure is still memcpy-able, but now we're entering territory where we're not really gaining much of a benefit from using a memory mapped file.

Since it seems like you'll only be using this locally and in a fixed setup, relaxing the requirements a little seems OK, so we're back to using the above struct normally. Now does the function have to be trivially copyable? I don't necessarily think so, consider this (probably broken) class:

   1 #include <iostream>
   2 #include <utility>
   3 
   4 enum Endian { LittleEndian, BigEndian };
   5 template<typename T, Endian e> struct PV {
   6         union {
   7                 unsigned char b[sizeof(T)];
   8                 T x;
   9         } val;  
  10         
  11         template<Endian oe> PV& operator=(const PV<T,oe>& rhs) {
  12                 val.x = rhs.val.x;
  13                 if (e != oe) {
  14                         for(size_t b = 0; b < sizeof(T) / 2; b++) {
  15                                 std::swap(val.b[sizeof(T)-1-b], val.b[b]);
  16                         }       
  17                 }       
  18                 return *this;
  19         }       
  20 };

It's not trivially copyable and you can't just use memcpy to move it around in general, but I don't see anything immediately wrong with using a class like this in the context of a memory mapped file (especially not if the file matches the native byte order).

Update:
Where do you draw the line?

I think a decent rule of thumb is: if the equivalent C code is acceptable and C++ is just being used as a convenience, to enforce type-safety, or proper access it should be fine.

That would make boost::interprocess::offset_ptr OK since it's just a helpful wrapper around a ptrdiff_t with special semantic rules. In the same vein struct PV above would be OK as it's just meant to byte swap automatically, though like in C you have to be careful to keep track of the byte order and assume that the structure can be trivially copied. Virtual functions wouldn't be OK as the C equivalent, function pointers in the structure, wouldn't work. However something like the following (untested) code would again be OK:

struct Foo { 
    unsigned char obj_type;
    void vfunc1(int arg0) { vtables[obj_type].vfunc1(this, arg0); }
};

回复收藏 0 原文

感情废物 2024-12-10 04:23:48

那是行不通的。您的 class Derived 不是 POD，因此它取决于编译器如何编译您的代码。换句话说——不要这样做。

顺便问一下，你在哪里释放你的对象？我看到正在就地创建对象，但您没有调用析构函数。

回复收藏 0 原文

山川志 2024-12-10 04:23:48

绝对不是。序列化是一项完善的功能，可在多种情况下使用，并且当然不需要 POD。它确实需要您指定一个定义良好的序列化二进制接口（SBI）。

每当对象离开运行时环境时都需要序列化，包括共享内存、管道、套接字、文件和许多其他持久性和通信机制。

POD 的帮助之处在于您知道自己不会离开处理器架构。如果您永远不会在对象的写入器（序列化器）和读取器（反序列化器）之间更改版本，并且不需要动态大小的数据，那么 POD 允许简单的基于 memcpy 的序列化器。

但通常情况下，您需要存储字符串之类的东西。然后，您需要一种存储和检索动态信息的方法。有时，会使用以 0 结尾的字符串，但这非常特定于字符串，并且不适用于向量、映射、数组、列表等。您经常会看到字符串和其他动态元素序列化为 [size][element 1] [元素 2]…这是 Pascal 数组格式。此外，在处理跨机器通信时，SBI 必须定义完整格式来处理潜在的字节顺序问题。

现在，指针通常是通过 ID 而不是偏移量来实现的。每个需要序列化的对象都可以被赋予一个递增的数字作为 ID，并且它可以是 SBI 中的第一个字段。您通常不使用偏移量的原因是，如果不经过调整步骤或第二遍，您可能无法轻松计算未来的偏移量。 ID 可以在第一次传递时在序列化例程内计算。

其他序列化方法包括使用 XML 或 JSON 等语法的基于文本的序列化器。这些是使用用于重建对象的标准文本工具来解析的。这些使 SBI 保持简单，但代价是性能和带宽下降。

最后，您通常会构建一个架构，在该架构中构建序列化流，该序列化流获取您的对象并将它们逐个转换为 SBI 的格式。对于共享内存，它通常在获取共享互斥体后将成员直接推送到内存。

这通常看起来像

void MyClass::Serialise(SerialisationStream & stream)
{
  stream & member1;
  stream & member2;
  stream & member3;
  // ...
}

&运算符对于您的不同类型来说是重载的。您可以查看 boost.serialize 以获取更多示例。

Absolutely not. Serialisation is a well established functionality that is used in numerous of situations, and certainly does not require PODs. What it does require is that you specify a well defined serialisation binary interface (SBI).

Serialisation is needed anytime your objects leave the runtime environment, including shared memory, pipes, sockets, files, and many other persistence and communication mechanisms.

Where PODs help is where you know you are not leaving the processor architecture. If you will never be changing versions between writers of the object (serialisers) and readers (deserialisers) and you have no need for dynamically-sized data, then PODs allow easy memcpy based serialisers.

Commonly, though, you need to store things like strings. Then, you need a way to store and retrieve the dynamic information. Sometimes, 0 terminated strings are used, but that is pretty specific to strings, and doesn't work for vectors, maps, arrays, lists, etc. You will often see strings and other dynamic elements serialized as [size][element 1][element 2]… this is the Pascal array format. Additionally, when dealing with cross machine communications, the SBI must define integral formats to deal with potential endianness issues.

Now, pointers are usually implemented by IDs, not offsets. Each object that needs to be serialise can be given an incrementing number as an ID, and that can be the first field in the SBI. The reason you usually don't use offsets is because you may not be able to easily calculate future offsets without going through a sizing step or a second pass. IDs can be calculated inside the serialisation routine on first pass.

Additional ways to serialize include text based serialisers using some syntax like XML or JSON. These are parsed using standard textual tools that are used to reconstruct the object. These keep the SBI simple at the cost of pessimising performance and bandwidth.

In the end, you typically build an architecture where you build serialisation streams that take your objects and translate them member by member to the format of your SBI. In the case of shared memory, it typically pushes the members directly on to the memory after acquiring the shared mutex.

This often looks like

void MyClass::Serialise(SerialisationStream & stream)
{
  stream & member1;
  stream & member2;
  stream & member3;
  // ...
}

where the & operator is overloaded for your different types. You may take a look at boost.serialize for more examples.

回复收藏 0 原文

~没有更多了~

关于作者

潜移默化

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

我是否需要将类型设为 POD 才能将其与内存映射文件一起保存？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

我是否需要将类型设为 POD 才能将其与内存映射文件一起保存？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。