使用重新诠释铸件来保存结构或类以文件
这是教授在他的剧本中向我们展示的东西。我没有在我编写的任何代码中使用此方法。
基本上,我们参加一堂课或结构,然后retinterpret_ctect并保存整个结构:
struct Account
{
Account()
{ }
Account(std::string one, std::string two)
: login_(one), pass_(two)
{ }
private:
std::string login_;
std::string pass_;
};
int main()
{
Account *acc = new Account("Christian", "abc123");
std::ofstream out("File.txt", std::ios::binary);
out.write(reinterpret_cast<char*>(acc), sizeof(Account));
out.close();
输出(在文件中)
ÍÍÍÍChristian ÍÍÍÍÍÍ ÍÍÍÍabc123 ÍÍÍÍÍÍÍÍÍ
这会产生我困惑的 。该方法实际上是否起作用,还是引起UB,因为在单个编译器的异想天开的类和结构中发生了神奇的事情?
This is something the professor showed us in his scripts. I have not used this method in any code I have written.
Basically, we take a class, or struct, and reinterpret_cast it and save off the entire struct like so:
struct Account
{
Account()
{ }
Account(std::string one, std::string two)
: login_(one), pass_(two)
{ }
private:
std::string login_;
std::string pass_;
};
int main()
{
Account *acc = new Account("Christian", "abc123");
std::ofstream out("File.txt", std::ios::binary);
out.write(reinterpret_cast<char*>(acc), sizeof(Account));
out.close();
This produces the output (in the file)
ÍÍÍÍChristian ÍÍÍÍÍÍ ÍÍÍÍabc123 ÍÍÍÍÍÍÍÍÍ
I'm confused. Does this method actually work, or does it cause UB because magical things happen within classes and structs that are at the whims of individual compilers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
它实际上行不通,但也不会引起不确定的行为。
在C ++中,将任何对象重新诠释为
char
的数组是合法的,因此这里没有未定义的行为。但是,结果通常仅在类是POD(有效地,当类是简单的C风格结构)和独立的(即结构没有指针数据成员)的情况下才能使用。
在这里,
帐户
不是POD,因为它具有std :: String
成员。std :: String
的内部已实现定义,但它不是POD,它通常具有指向存储实际字符串的某些堆积的块(在您的特定示例中,实现是使用小弦优化的,其中字符串的值存储在std :: String
对象本身中)。有一些问题:
您并不总是会获得期望的结果。如果您的字符串较长,则
std :: String
将使用在堆上分配的缓冲区存储字符串,因此您最终只会序列化指针,而不是尖头的字符串。<<<<<<<<<<<<<<<<< /p>您实际上无法使用您在此处序列化的数据。您不能仅仅将数据重新解释为
帐户
并期望它可以正常工作,因为std :: String
构造函数不会被调用。简而言之,您不能使用此方法来序列化复杂的数据结构。
It doesn't actually work, but it also does not cause undefined behavior.
In C++ it is legal to reinterpret any object as an array of
char
, so there is no undefined behavior here.The results, however, are usually only usable if the class is POD (effectively, if the class is a simple C-style struct) and self-contained (that is, the struct doesn't have pointer data members).
Here,
Account
is not POD because it hasstd::string
members. The internals ofstd::string
are implementation-defined, but it is not POD and it usually has pointers that refer to some heap-allocated block where the actual string is stored (in your specific example, the implementation is using a small-string optimization where the value of the string is stored in thestd::string
object itself).There are a few issues:
You aren't always going to get the results you expect. If you had a longer string, the
std::string
would use a buffer allocated on the heap to store the string and so you will end up just serializing the pointer, not the pointed-to string.You can't actually use the data you've serialized here. You can't just reinterpret the data as an
Account
and expect it to work, because thestd::string
constructors would not get called.In short, you cannot use this approach for serializing complex data structures.
它不是不确定的。相反,它是平台依赖性或实现定义的行为。总的来说,这是因为同一编译器的不同版本,甚至是同一编译器上的不同开关都可以破坏您的保存文件格式。
It's not undefined. Rather, it's platform dependent or implementation defined behavior. This is, in general bad code, because differing versions of the same compiler, or even different switches on the same compiler, can break your save file format.
这可以根据结构的内容以及数据回读的平台来起作用。这是一个有风险的,不可交付的黑客,您的老师不应该传播它。
您在结构中是否有指针或
int
?回读时,指针将在新过程中无效,并且int
格式在所有计算机上都不相同(以这种方法为单位,而是两个令人震惊的问题)。任何指向对象图的一部分的内容都不会被处理。在目标计算机(32位与64位)上的结构包装可能有所不同,甚至可能是由于编译器选项在同一硬件上更改,从而使sizeof(account)不可靠,因为读取了返回数据大小。
有关更好的解决方案,请查看序列化库为您处理这些问题。 boost.serialization 是一个很好的例子。
Google协议缓冲器也适用于简单的对象层次结构。
This could work depending on the contents of the struct, and the platform on which the data is read back. This is a risky, non-portable hack which your teacher should not be propagating.
Do you have pointers or
int
s in the struct? Pointers will be invalid in the new process when read back, andint
format is not the same on all machines (to name but two show-stopping problems with this approach). Anything that's pointed to as part of an object graph will not be handled. Structure packing could be different on the target machine (32-bit vs 64-bit) or even due to compiler options changing on the same hardware, makingsizeof(Account)
unreliable as a read back data size.For a better solution, look at a serialization library which handles those issues for you. Boost.Serialization is a good example.
Google Protocol Buffers also works well for simple object hierarchies.
它不能代替适当的序列化。考虑任何包含指针的复杂类型的情况 - 如果将指针保存到文件中,则在以后加载它们时,它们不会指向任何有意义的东西。
此外,如果代码更改,或者即使使用不同的编译器选项重新编译,则可能会破坏。
因此,它实际上仅对于简单类型的短期存储才是有用的 - 在此过程中,它占用了比该任务所需的更多空间。
It's no substitute for proper serialization. Consider the case of any complex type that contains pointers - if you save the pointers to a file, when you load them up later, they're not going to point to anything meaningful.
Additionally, it's likely to break if the code changes, or even if it's recompiled with different compiler options.
So it's really only useful for short-term storage of simple types - and in doing so, it takes up way more space than necessary for that task.
如果该方法完全有效,则远非强大。最好决定某些“串行”形式,无论是二进制,文本,XML等,并将其写出来。
这里的关键:您需要一个函数/代码来可靠地将类或结构转换为/从一系列字节中。
reinterpret_cast
不执行此操作,因为用于代表类或结构的内存中的确切字节可以更改诸如填充,成员订单等之类的内容。This method, if it works at all, is far from robust. It is much better to decide on some "serialized" form, whether it is binary, text, XML, etc., and write that out.
The key here: You need a function/code to reliably convert your class or struct to/from a series of bytes.
reinterpret_cast
does not do this, as the exact bytes in memory used to represent the class or struct can change for things like padding, order of members, etc.否。
为了使其正常工作,结构必须是一个POD(纯旧数据:只有简单的数据成员和POD数据成员,没有虚拟功能……可能是我不记得的其他一些限制)。
因此,如果您想这样做,则需要这样的结构:
请注意,STD :: String不是吊舱,因此您需要简单的数组。
尽管如此,对您来说不是一个好方法。关键字:“序列化” :)。
No.
In order for it to work, the structure must be a POD (plain old data: only simple data members and POD data members, no virtual functions... probably some other restrictions which I can't remember).
So if you wanted to do that, you'd need a struct like this:
Note that std::string's not a POD, so you'd need plain arrays.
Still, not a good approach for you. Keyword: "serialization" :).
字符串的某些版本不实际使用动态内存,当字符串小时。因此,将字符串内部存储在字符串对象中。
想一想:
现在在64位系统上。每个指针是8个字节。因此,小于32个字节的字符串可以适合相同的结构,而无需分配缓冲区。
所以这就是我认为正在发生的事情。
正在使用一个非常有效的字符串实现,从而使您可以将对象倒入不用担心指针的情况下(因为STD :: String不使用指针)。
显然,这是不可移植的,因为这取决于STD :: String的实现详细信息。
如此有趣的技巧,但不能便携(并且在没有一些编译时间检查的情况下很容易破裂)。
Some version of string don;t actually use dynamic memory for the string when the string is small. Thus store the string internally in the string object.
Think of this:
Now on a 64 bit system. Each pointer is 8 bytes. Thus a string of less than 32 bytes could fit into the same structure without allocating a buffer.
So this is what I believe is happening above.
A very efficient string implementation is being used thus allowing you to dump the object to file without worrying about pointers (as the std::string are not using pointers).
Obviously this is not portable as it depends on the implementation details of std::string.
So interesting trick, but not portable (and liable to break easily without some compile time checks).