“适当的” 用C++/STL存储二进制数据的方法

发布于 2024-07-11 22:18:54 字数 213 浏览 6 评论 0原文

一般来说,在 C++ 中存储二进制数据的最佳方式是什么? 据我所知,这些选项几乎可以归结为使用字符串或向量。 (我将省略 char* 和 malloc() 的可能性,因为我特指 C++)。

通常我只使用一个字符串,但是我不确定是否缺少一些开销,或者 STL 内部进行的转换可能会扰乱二进制数据的完整性。 有人对此有任何指示吗? 某种方式的建议或偏好?

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

柠檬 2024-07-18 22:18:54

char 向量很好,因为内存是连续的。 因此,您可以将它与许多 C API 一起使用,例如 Berkley 套接字或文件 API。 例如,您可以执行以下操作:

  std::vector<char> vect;
  ...
  send(sock, &vect[0], vect.size());

并且它会正常工作。

您基本上可以像对待任何其他动态分配的字符缓冲区一样对待它。 您可以上下扫描寻找神奇的数字或模式。 您可以就地部分解析它。 对于从套接字接收,您可以很容易地调整它的大小以附加更多数据。

缺点是调整大小不是非常有效(谨慎调整大小或预分配)并且从数组前面删除也将非常低效。 例如,如果您需要非常频繁地一次从数据结构的前面弹出一两个字符,那么在此处理之前复制到双端队列可能是一种选择。 这会花费您一个副本,并且双端队列内存不是连续的,因此您不能只将指针传递给 C API。

最重要的是,在深入研究之前了解数据结构及其权衡,但是 char 向量通常是我在一般实践中使用的。

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

  std::vector<char> vect;
  ...
  send(sock, &vect[0], vect.size());

and it will work fine.

You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.

Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.

聆听风音 2024-07-18 22:18:54

std::string 的最大问题是当前标准不能保证其底层存储是连续的。 然而,没有已知的 STL 实现中字符串不连续,因此实际上它可能不会失败。 事实上,新的 C++0x 标准将通过强制 std::string 使用连续缓冲区(例如 std::vector)来解决此问题。

反对字符串的另一个论点是,它的名字表明它包含一个字符串,而不是一个二进制缓冲区,这可能会给阅读代码的人带来困惑。

也就是说,我也推荐矢量。

The biggest problem with std::string is that the current standard doesn't guarantee that its underlying storage is contiguous. However, there are no known STL implementations where string is not contiguous, so in practice it probably won't fail. In fact, the new C++0x standard is going to fix this problem, by mandating that std::string uses a contiguous buffer, such as std::vector.

Another argument against string is that its name suggests that it contains a character string, not a binary buffer, which may cause confusion to those who read the code.

That said, I recommend vector as well.

月下伊人醉 2024-07-18 22:18:54

我也使用 std::string 来实现此目的,并且从未遇到过问题。

一个“指针”,我昨天刚刚在一段代码中收到了尖锐的提醒:当从二进制数据块创建字符串时,使用 std::string(startIter, endIter) 构造函数形式,而不是 std::string(ptr, offset, length) 形式 - 后者假设指针指向 C 样式字符串,并忽略第一个零字符之后的任何内容 (它复制“最多”指定的长度,而不是长度字符)。

I use std::string for this too, and have never had a problem with it.

One "pointer," which I just received a sharp reminder of in a piece of code yesterday: when creating a string from a block of binary data, use the std::string(startIter, endIter) constructor form, not the std::string(ptr, offset, length) form -- the latter makes the assumption that the pointer points to a C-style string, and ignores anything after the first zero character (it copies "up to" the specified length, not length characters).

み格子的夏天 2024-07-18 22:18:54

您当然应该使用一些 char 容器,但您要使用的容器取决于您的应用程序。

字符有几个属性,使它们对于保存二进制数据很有用:标准不允许对字符数据类型进行任何“填充”,这很重要,因为这意味着您不会在二进制布局中得到垃圾。 每个字符也保证恰好是一个字节,使其成为唯一具有设定宽度的普通旧数据类型(POD)(所有其他字符均根据上限和/或下限指定)。

上面的 Doug 很好地讨论了用于存储字符的适当 stl 容器的讨论。 您需要哪一种完全取决于您的用例。 如果您只是持有要迭代的数据块,没有任何特殊的查找、追加/删除或拼接需求,我更喜欢向量,这使您的意图比 std::string 更清晰,许多库和函数都会假设 std::string保存一个以 null 结尾的 C 风格字符串。

You should certainly be using some container of char, but the container you want to use depends on your application.

Chars have several properties that make them useful for holding binary data: the standard disallows any "padding" for a char datatype, which is important since it means that you won't get garbage in your the binary layout. Each char is also guaranteed to be exactly one byte, making it the only plain old datatype (POD) with set width (all others are specified in terms of upper and/or lower bounds).

The discussion on appropriate stl container with which to store the chars is handled by well by Doug above. Which one you need depends entirely on your use case. If you are just holding a block of data you iterate through, without any special lookup, append/remove, or splice needs, I would prefer vector, which makes your intentions more clear than std::string, which many libraries and functions will assume holds a null-terminated c-style string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文