使用 ifstream 和 ofstream 序列化二进制数据时遇到问题
我正在尝试使用 ifstream 和 ofstream 序列化普通旧数据结构,但无法使其工作。然后,我尝试将问题简化为仅包含 char 和 int 的超基本序列化,但即使这样也行不通。显然我在核心基础层面上遗漏了一些东西。
对于基本结构:
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(std::ofstream& ofs);
};
使用序列化函数:
void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
}
为什么下面的短程序会失败?
//ultra basic serialization test.
SerializeTestStruct* testStruct = new SerializeTestStruct();
testStruct->mCharVal = 'y';
testStruct->mIntVal = 9;
//write
std::string testFileName = "test.bin";
std::ofstream fileOut(testFileName.data());
fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct->Serialize(fileOut);
fileOut.flush();
fileOut.close();
delete testStruct;
//read
char * memblock;
std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
// get length of file:
fileIn.seekg (0, std::ifstream::end);
int length = fileIn.tellg();
fileIn.seekg (0, std::ifstream::beg);
// allocate memory:
memblock = new char [length];
fileIn.read(memblock, length);
fileIn.close();
// read data as a block:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
delete[] testStruct2;
}
当我运行代码时,我注意到 memblock
在顶部有一个“y”,所以也许它正在工作,这只是最后的 placement new
的问题?在新的放置之后,我最终得到一个 SerializeTestStruct
,其值为:0, 0。
I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.
For a basic structure:
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(std::ofstream& ofs);
};
With serialize function:
void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
}
Why would this fail with the following short program?
//ultra basic serialization test.
SerializeTestStruct* testStruct = new SerializeTestStruct();
testStruct->mCharVal = 'y';
testStruct->mIntVal = 9;
//write
std::string testFileName = "test.bin";
std::ofstream fileOut(testFileName.data());
fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct->Serialize(fileOut);
fileOut.flush();
fileOut.close();
delete testStruct;
//read
char * memblock;
std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
// get length of file:
fileIn.seekg (0, std::ifstream::end);
int length = fileIn.tellg();
fileIn.seekg (0, std::ifstream::beg);
// allocate memory:
memblock = new char [length];
fileIn.read(memblock, length);
fileIn.close();
// read data as a block:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
delete[] testStruct2;
}
When I run through the code I notice that memblock
has a "y" at the top so maybe it is working and it's just a problem with the placement new
at the very end? After that placement new I end up with a SerializeTestStruct
with values: 0, 0.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的内容应如下所示:
风格问题:
new
来创建内容。堆栈分配的对象通常是您想要的,并且比从堆分配的任意生命周期对象更容易管理。如果您确实使用new
,请考虑使用某种智能指针类型来帮助您管理生命周期。错误...
::std::string
的data
成员函数。Serialize
函数采用的类型中使用ofstream
,因为它是一个派生类,其功能您不需要。您应该始终使用层次结构中具有所需功能的最基类,除非您有非常具体的理由不这样做。Serialize
与基ostream
类的功能配合得很好,因此请改用该类型。serialize
函数,则需要一个匹配的deserialize
函数。这是对内存布局问题的进一步解释。基于 x86_64 的 Linux 机器上的内存结构如下所示:
padding
部分的内容未定义,但通常为0
。但这并不重要,因为该空间从未被使用过,只是存在,因此对以下int
的访问位于有效的 4 字节边界上。磁盘上结构的大小为 5 个字节,并且完全缺少填充部分。因此,这意味着当您将其读入内存时,它根本不会与内存结构正确对齐,并且访问它可能会导致某种可怕的问题。
第一条规则,如果您需要一个
serialize
函数,那么您就需要一个deserialize
函数。第二条规则,除非您确实知道自己在做什么,否则不要将原始内存转储到文件中。这在许多情况下都可以正常工作,但在某些重要情况下它不起作用。除非您知道什么有效,什么无效,以及何时有效或无效,否则您最终会得到在某些测试情况下似乎工作正常的代码,但当您尝试在特定环境中使用它时却会惨败。真实的系统。我的代码仍然将内存转储到文件中。只要您在与编写代码时使用相同版本的编译器编译的代码在完全相同的架构和平台上读回结果,它就应该可以工作。一旦其中一个变量发生变化,所有的赌注都会被取消。
Here is how your stuff should read:
Style issues:
new
to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do usenew
, consider using a smart pointer type of some kind to help manage the lifetime for you.Mistakes...
data
member function of::std::string
.new
and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.ofstream
in the type taken by yourSerialize
function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to.Serialize
works fine with the features of the baseostream
class, so use that type instead.serialize
function, you need a matchingdeserialize
function.Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:
The contents of the
padding
section are undefined, but generally0
. It doesn't matter though because that space is never used and merely exists so that access to the followingint
lies on an efficient 4-byte boundary.The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.
The first rule, if you need a
serialize
function, you need adeserialize
function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.
更改为
我会做的:
change to
I would do:
问题就在这里:
这将在先前分配的内存中构造
SerializeTestStruct
类型的值初始化对象。它将用零填充memblock
,因为 POD 类型的值初始化是零初始化(更多信息)。这是您的代码的快速修复:
The problem is here:
This will construct value-initialized object of type
SerializeTestStruct
in previously allocated memory. It will fill thememblock
with zeros, since value-initialization is zero-initialization for POD-types (more info).Here's fast fix for your code:
在我看来,您需要允许序列化到缓冲区,而不是直接序列化到流。写入缓冲区允许嵌套或继承的类写入内存,然后可以将整个缓冲区写入流。将位和片段写入流的效率不高。
在我停止将二进制数据写入流之前,这是我炮制的内容:
我停止将二进制数据写入流,转而使用文本数据,因为文本数据不必担心字节顺序或接收平台接受哪种 IEEE 浮点格式。
In my opinion, you need allow serialization to a buffer and not directly to a stream. Writing to a buffer allows for nested or inherited classes to write to memory, then the whole buffer can be written to the stream. Writing bits and pieces to the stream is not efficient.
Here is something I concocted, before I stopped writing binary data to streams:
I stopped writing binary data to streams in favor of textual data because textual data doesn't have to worry about Endianess or which IEEE floating point format is accepted by the receiving platform.
我是唯一一个认为这完全不透明的人吗:
为什么不呢:
Am I the only one that finds this totally opaque:
why not: