使用 ifstream 和 ofstream 序列化二进制数据时遇到问题

发布于 2024-11-02 09:54:18 字数 1867 浏览 1 评论 0原文

我正在尝试使用 ifstream 和 ofstream 序列化普通旧数据结构,但无法使其工作。然后,我尝试将问题简化为仅包含 char 和 int 的超基本序列化,但即使这样也行不通。显然我在核心基础层面上遗漏了一些东西。

对于基本结构:

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(std::ofstream& ofs);
};

使用序列化函数:

void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
    bool isError = (false == ofs.good());
    if (false == isError)
    {
        ofs.write((char*)&mCharVal, sizeof(mCharVal));
        ofs.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

为什么下面的短程序会失败?

//ultra basic serialization test.
    SerializeTestStruct* testStruct = new SerializeTestStruct();
    testStruct->mCharVal = 'y';
    testStruct->mIntVal = 9;

    //write
    std::string testFileName = "test.bin";
    std::ofstream fileOut(testFileName.data());
    fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
    fileOut.clear();
    testStruct->Serialize(fileOut);

    fileOut.flush();
    fileOut.close();

    delete testStruct;

    //read
    char * memblock;
    std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
    if (fileIn.is_open())
    {
        // get length of file:
        fileIn.seekg (0, std::ifstream::end);
        int length = fileIn.tellg();
        fileIn.seekg (0, std::ifstream::beg);

        // allocate memory:
        memblock = new char [length];
        fileIn.read(memblock, length);
        fileIn.close();

        // read data as a block:
        SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

        delete[] testStruct2;
    }

当我运行代码时,我注意到 memblock 在顶部有一个“y”,所以也许它正在工作,这只是最后的 placement new 的问题?在新的放置之后,我最终得到一个 SerializeTestStruct ,其值为:0, 0。

I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.

For a basic structure:

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(std::ofstream& ofs);
};

With serialize function:

void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
    bool isError = (false == ofs.good());
    if (false == isError)
    {
        ofs.write((char*)&mCharVal, sizeof(mCharVal));
        ofs.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

Why would this fail with the following short program?

//ultra basic serialization test.
    SerializeTestStruct* testStruct = new SerializeTestStruct();
    testStruct->mCharVal = 'y';
    testStruct->mIntVal = 9;

    //write
    std::string testFileName = "test.bin";
    std::ofstream fileOut(testFileName.data());
    fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
    fileOut.clear();
    testStruct->Serialize(fileOut);

    fileOut.flush();
    fileOut.close();

    delete testStruct;

    //read
    char * memblock;
    std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
    if (fileIn.is_open())
    {
        // get length of file:
        fileIn.seekg (0, std::ifstream::end);
        int length = fileIn.tellg();
        fileIn.seekg (0, std::ifstream::beg);

        // allocate memory:
        memblock = new char [length];
        fileIn.read(memblock, length);
        fileIn.close();

        // read data as a block:
        SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

        delete[] testStruct2;
    }

When I run through the code I notice that memblock has a "y" at the top so maybe it is working and it's just a problem with the placement new at the very end? After that placement new I end up with a SerializeTestStruct with values: 0, 0.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

寄居者 2024-11-09 09:54:18

您的内容应如下所示:

#include <fstream>
#include <string>
#include <stdexcept>

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(::std::ostream &os);
    static SerializeTestStruct Deserialize(::std::istream &is);
};

void SerializeTestStruct::Serialize(std::ostream &os)
{
    if (os.good())
    {
        os.write((char*)&mCharVal, sizeof(mCharVal));
        os.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
        SerializeTestStruct retval;

    if (is.good())
    {
        is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
        is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
    }
    if (is.fail()) {
        throw ::std::runtime_error("failed to read full struct");
    }
    return retval;
}

int main(int argc, const char *argv[])
{
//ultra basic serialization test.

    // setup
    const ::std::string testFileName = "test.bin";

    // write
    {
        SerializeTestStruct testStruct;
        testStruct.mCharVal = 'y';
        testStruct.mIntVal = 9;

        ::std::ofstream fileOut(testFileName.c_str());
        fileOut.open(testFileName.c_str(),
                     std::ofstream::binary|std::ofstream::out);
        fileOut.clear();
        testStruct.Serialize(fileOut);
    }

    // read
    {
        ::std::ifstream fileIn (testFileName.c_str(),
                                std::ifstream::in|std::ifstream::binary);
        if (fileIn.is_open())
        {
            SerializeTestStruct testStruct =            \
                SerializeTestStruct::Deserialize(fileIn);

            ::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
                        << "' && testStruct.mIntVal == " << testStruct.mIntVal
                        << '\n';
        }
    }
    return 0;
}

风格问题:

  • 如果可以的话,不要使用 new 来创建内容。堆栈分配的对象通常是您想要的,并且比从堆分配的任意生命周期对象更容易管理。如果您确实使用new,请考虑使用某种智能指针类型来帮助您管理生命周期。
  • 序列化和反序列化代码应该匹配,以便可以一起检查和更改它们。这使得此类代码的维护变得更加容易。
  • 依靠 C++ 用析构函数来清理东西,这就是它们的用途。这意味着如果所使用的变量的范围相对有限,则制作包含部分代码的基本块。
  • 不要不必要地使用标志。

错误...

  • 不要使用 ::std::stringdata 成员函数。
  • 使用放置 new 和内存块确实是个坏主意,因为它非常复杂。如果您确实使用了它,那么您就不会像以前那样使用数组删除。最后,由于稍后解释的原因,它无论如何都不会起作用。
  • 不要在 Serialize 函数采用的类型中使用 ofstream,因为它是一个派生类,其功能您不需要。您应该始终使用层次结构中具有所需功能的最基类,除非您有非常具体的理由不这样做。 Serialize 与基 ostream 类的功能配合得很好,因此请改用该类型。
  • 您的结构的磁盘布局和内存布局不匹配,因此您的放置新技术注定会失败。通常,如果您有 serialize 函数,则需要一个匹配的 deserialize 函数。

这是对内存布局问题的进一步解释。基于 x86_64 的 Linux 机器上的内存结构如下所示:

+------------+-----------+
|Byte number | contents  |
+============+===========+
|          0 |     0x79  |
|            | (aka 'y') |
+------------+-----------+
|          1 |   padding |
+------------+-----------+
|          3 |   padding |
+------------+-----------+
|          4 |   padding |
+------------+-----------+
|          5 |         9 |
+------------+-----------+
|          6 |         0 |
+------------+-----------+
|          7 |         0 |
+------------+-----------+
|          8 |         0 |
+------------+-----------+

padding 部分的内容未定义,但通常为 0。但这并不重要,因为该空间从未被使用过,只是存在,因此对以下 int 的访问位于有效的 4 字节边界上。

磁盘上结构的大小为 5 个字节,并且完全缺少填充部分。因此,这意味着当您将其读入内存时,它根本不会与内存结构正确对齐,并且访问它可能会导致某种可怕的问题。

第一条规则,如果您需要一个 serialize 函数,那么您就需要一个 deserialize 函数。第二条规则,除非您确实知道自己在做什么,否则不要将原始内存转储到文件中。这在许多情况下都可以正常工作,但在某些重要情况下它不起作用。除非您知道什么有效,什么无效,以及何时有效或无效,否则您最终会得到在某些测试情况下似乎工作正常的代码,但当您尝试在特定环境中使用它时却会惨败。真实的系统。

我的代码仍然将内存转储到文件中。只要您在与编写代码时使用相同版本的编译器编译的代码在完全相同的架构和平台上读回结果,它就应该可以工作。一旦其中一个变量发生变化,所有的赌注都会被取消。

Here is how your stuff should read:

#include <fstream>
#include <string>
#include <stdexcept>

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(::std::ostream &os);
    static SerializeTestStruct Deserialize(::std::istream &is);
};

void SerializeTestStruct::Serialize(std::ostream &os)
{
    if (os.good())
    {
        os.write((char*)&mCharVal, sizeof(mCharVal));
        os.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
        SerializeTestStruct retval;

    if (is.good())
    {
        is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
        is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
    }
    if (is.fail()) {
        throw ::std::runtime_error("failed to read full struct");
    }
    return retval;
}

int main(int argc, const char *argv[])
{
//ultra basic serialization test.

    // setup
    const ::std::string testFileName = "test.bin";

    // write
    {
        SerializeTestStruct testStruct;
        testStruct.mCharVal = 'y';
        testStruct.mIntVal = 9;

        ::std::ofstream fileOut(testFileName.c_str());
        fileOut.open(testFileName.c_str(),
                     std::ofstream::binary|std::ofstream::out);
        fileOut.clear();
        testStruct.Serialize(fileOut);
    }

    // read
    {
        ::std::ifstream fileIn (testFileName.c_str(),
                                std::ifstream::in|std::ifstream::binary);
        if (fileIn.is_open())
        {
            SerializeTestStruct testStruct =            \
                SerializeTestStruct::Deserialize(fileIn);

            ::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
                        << "' && testStruct.mIntVal == " << testStruct.mIntVal
                        << '\n';
        }
    }
    return 0;
}

Style issues:

  • Don't use new to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new, consider using a smart pointer type of some kind to help manage the lifetime for you.
  • Serialization and deserialization code should be matched up so that they can be examined and altered together. This makes maintenance of such code much easier.
  • Rely on C++ to clean things up for you with destructors, that's what they're for. This means making basic blocks containing parts of your code if it the scopes of the variables used is relatively confined.
  • Don't needlessly use flags.

Mistakes...

  • Don't use the data member function of ::std::string.
  • Using placement new and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.
  • Do not use ofstream in the type taken by your Serialize function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize works fine with the features of the base ostream class, so use that type instead.
  • The on-disk layout of your structure and the in memory layout do not match, so your placement new technique is doomed to fail. As a rule, if you have a serialize function, you need a matching deserialize function.

Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:

+------------+-----------+
|Byte number | contents  |
+============+===========+
|          0 |     0x79  |
|            | (aka 'y') |
+------------+-----------+
|          1 |   padding |
+------------+-----------+
|          3 |   padding |
+------------+-----------+
|          4 |   padding |
+------------+-----------+
|          5 |         9 |
+------------+-----------+
|          6 |         0 |
+------------+-----------+
|          7 |         0 |
+------------+-----------+
|          8 |         0 |
+------------+-----------+

The contents of the padding section are undefined, but generally 0. It doesn't matter though because that space is never used and merely exists so that access to the following int lies on an efficient 4-byte boundary.

The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.

The first rule, if you need a serialize function, you need a deserialize function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.

My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.

素衣风尘叹 2024-11-09 09:54:18
bool isError = (false == ofs.good());
if (false == isError)
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

更改为

if ( ofs.good() )
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

我会做的:

ostream & operator << ( ostream &os, const SerializeTestStruct &mystruct )
{
  if ( ofs.good() )
  {
    os.write((char*)&mystruct.mCharVal, sizeof(mCharVal));
    os.write((char*)&mystruct.mIntVal, sizeof(mIntVal));
  }
  return os;
}
bool isError = (false == ofs.good());
if (false == isError)
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

change to

if ( ofs.good() )
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

I would do:

ostream & operator << ( ostream &os, const SerializeTestStruct &mystruct )
{
  if ( ofs.good() )
  {
    os.write((char*)&mystruct.mCharVal, sizeof(mCharVal));
    os.write((char*)&mystruct.mIntVal, sizeof(mIntVal));
  }
  return os;
}
木森分化 2024-11-09 09:54:18

问题就在这里:

SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

这将在先前分配的内存中构造 SerializeTestStruct 类型的值初始化对象。它将用零填充 memblock,因为 POD 类型的值初始化零初始化更多信息)。

这是您的代码的快速修复:

SerializeTestStruct* testStruct2 = new SerializeTestStruct;
fileIn.read( (char*)&testStruct2->mCharVal, sizeof(testStruct2->mCharVal) );
fileIn.read( (char*)&testStruct2->mIntVal, sizeof(testStruct2->mIntVal) );
fileIn.close();
// do some with testStruct2
// ...
delete testStruct2;

The problem is here:

SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

This will construct value-initialized object of type SerializeTestStruct in previously allocated memory. It will fill the memblock with zeros, since value-initialization is zero-initialization for POD-types (more info).

Here's fast fix for your code:

SerializeTestStruct* testStruct2 = new SerializeTestStruct;
fileIn.read( (char*)&testStruct2->mCharVal, sizeof(testStruct2->mCharVal) );
fileIn.read( (char*)&testStruct2->mIntVal, sizeof(testStruct2->mIntVal) );
fileIn.close();
// do some with testStruct2
// ...
delete testStruct2;
你是我的挚爱i 2024-11-09 09:54:18

在我看来,您需要允许序列化到缓冲区,而不是直接序列化到流。写入缓冲区允许嵌套或继承的类写入内存,然后可以将整个缓冲区写入流。将位和片段写入流的效率不高。

在我停止将二进制数据写入流之前,这是我炮制的内容:

struct Serialization_Interface
{
    //!  Returns size occupied on a stream.
    /*! Note:  size on the platform may be different.
     *  This method is used to allocate memory.
     */
    virtual size_t  size_on_stream(void) const = 0;

    //!  Stores the fields of the object to the given pointer.
    /*!  Pointer is incremented by the size on the stream.
     */
    virtual void    store_to_buffer(unsigned char *& p_buffer) const = 0;

    //!  Loads the object's fields from the buffer, advancing the pointer.
    virtual void    load_from_buffer(const unsigned char *& p_buffer) = 0;
};

struct Serialize_Test_Structure
  : Serialization_Interface
{
    char mCharVal;
    int  mIntVal;

    size_t  size_on_stream(void) const
    {
        return sizeof(mCharVal) + sizeof(mIntVal);
    }

    void  store_to_buffer(unsigned char *& p_buffer) const
    {
        *p_buffer++ = mCharVal;
        ((int&)(*p_buffer)) = mIntVal;
        p_buffer += sizeof(mIntVal);
        return;
    }

    void  load_from_buffer(const unsigned char *& p_buffer)
    {
         mCharVal = *p_buffer++;
         mIntVal = (const int&)(*p_buffer);
         p_buffer += sizeof(mIntVal);
         return;
    }
};


int main(void)
{
   struct Serialize_Test_Struct myStruct;
   myStruct.mCharVal = 'G';
   myStruct.mIntVal = 42;

   // Allocate a buffer:
   unsigned char * buffer = new unsigned char[](myStruct.size_on_stream());

   // Create output file.
   std::ofstream outfile("data.bin");

   // Does your design support this concept?
   unsigned char * p_buffer = buffer;
   myStruct.store_to_buffer(p_buffer);
   outfile.write((char *) buffer, myStruct.size_on_stream());

   outfile.close();
   return 0;
}

我停止将二进制数据写入流,转而使用文本数据,因为文本数据不必担心字节顺序或接收平台接受哪种 IEEE 浮点格式。

In my opinion, you need allow serialization to a buffer and not directly to a stream. Writing to a buffer allows for nested or inherited classes to write to memory, then the whole buffer can be written to the stream. Writing bits and pieces to the stream is not efficient.

Here is something I concocted, before I stopped writing binary data to streams:

struct Serialization_Interface
{
    //!  Returns size occupied on a stream.
    /*! Note:  size on the platform may be different.
     *  This method is used to allocate memory.
     */
    virtual size_t  size_on_stream(void) const = 0;

    //!  Stores the fields of the object to the given pointer.
    /*!  Pointer is incremented by the size on the stream.
     */
    virtual void    store_to_buffer(unsigned char *& p_buffer) const = 0;

    //!  Loads the object's fields from the buffer, advancing the pointer.
    virtual void    load_from_buffer(const unsigned char *& p_buffer) = 0;
};

struct Serialize_Test_Structure
  : Serialization_Interface
{
    char mCharVal;
    int  mIntVal;

    size_t  size_on_stream(void) const
    {
        return sizeof(mCharVal) + sizeof(mIntVal);
    }

    void  store_to_buffer(unsigned char *& p_buffer) const
    {
        *p_buffer++ = mCharVal;
        ((int&)(*p_buffer)) = mIntVal;
        p_buffer += sizeof(mIntVal);
        return;
    }

    void  load_from_buffer(const unsigned char *& p_buffer)
    {
         mCharVal = *p_buffer++;
         mIntVal = (const int&)(*p_buffer);
         p_buffer += sizeof(mIntVal);
         return;
    }
};


int main(void)
{
   struct Serialize_Test_Struct myStruct;
   myStruct.mCharVal = 'G';
   myStruct.mIntVal = 42;

   // Allocate a buffer:
   unsigned char * buffer = new unsigned char[](myStruct.size_on_stream());

   // Create output file.
   std::ofstream outfile("data.bin");

   // Does your design support this concept?
   unsigned char * p_buffer = buffer;
   myStruct.store_to_buffer(p_buffer);
   outfile.write((char *) buffer, myStruct.size_on_stream());

   outfile.close();
   return 0;
}

I stopped writing binary data to streams in favor of textual data because textual data doesn't have to worry about Endianess or which IEEE floating point format is accepted by the receiving platform.

东北女汉子 2024-11-09 09:54:18

我是唯一一个认为这完全不透明的人吗:

bool isError = (false == ofs.good());
if (false == isError) {
    // stuff
}

为什么不呢:

if ( ofs ) {
    // stuff
}

Am I the only one that finds this totally opaque:

bool isError = (false == ofs.good());
if (false == isError) {
    // stuff
}

why not:

if ( ofs ) {
    // stuff
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文