将类型序列与 iostream 混合的最简单方法?

发布于 2024-08-26 04:54:13 字数 679 浏览 6 评论 0原文

我有一个函数 void write(const T&) ,它是通过将 T 对象写入 ostream 来实现的,还有一个匹配函数 T read( ) 从 istream 读取 T。我基本上使用 iostreams 作为纯文本序列化格式,这显然适用于大多数内置类型,尽管我还不确定如何有效处理 std::strings 。

我也希望能够写出一系列对象,例如 void write(const std::vector&) 或基于迭代器的等效项(尽管在实践中,它总是与向量一起使用)。然而,虽然编写一个迭代元素并将它们写出的重载很容易做到,但这并没有添加足够的信息来允许匹配的读取操作知道每个元素是如何定界的,这本质上与我的问题相同有一个 std::string。

是否有一种方法可以适用于所有基本类型和 std::string?或者也许我可以摆脱 2 个重载,一个用于数字类型,一个用于字符串? (也许使用不同的分隔符或使用分隔符转义机制的字符串。)

编辑:我很欣赏在遇到这样的问题时通常明智的倾向,即“你不想这样做”并建议一种更好的方法,但我真的希望提出与我所问的问题直接相关的建议,而不是您认为我应该问的问题。 :)

I have a function void write<typename T>(const T&) which is implemented in terms of writing the T object to an ostream, and a matching function T read<typename T>() that reads a T from an istream. I am basically using iostreams as a plain text serialisation format, which obviously works fine for most built-in types, although I'm not sure how to effectively handle std::strings just yet.

I'd like to be able to write out a sequence of objects too, eg void write<typename T>(const std::vector<T>&) or an iterator based equivalent (although in practice, it would always be used with a vector). However, while writing an overload that iterates over the elements and writes them out is easy enough to do, this doesn't add enough information to allow the matching read operation to know how each element is delimited, which is essentially the same problem that I have with a single std::string.

Is there a single approach that can work for all basic types and std::string? Or perhaps I can get away with 2 overloads, one for numerical types, and one for strings? (Either using different delimiters or the string using a delimiter escaping mechanism, perhaps.)

EDIT: I appreciate the often sensible tendency when confronted with questions like this is to say, "you don't want to do that" and to suggest a better approach, but I would really like suggestions that relate directly to what I asked, rather than what you believe I should have asked instead. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一萌ing 2024-09-02 04:54:13

通用的序列化框架很难,而且 iostream 库的内置功能确实无法胜任——即使令人满意地处理字符串也是相当困难的。我建议您要么坐下来从头开始设计框架,忽略 iostream(然后成为实现细节),要么(更现实地)使用现有的库,或者至少使用现有的格式,例如 XML。

A general-purpose serialisation framework is hard, and the built-in features of the iostream library are really not up to it - even dealing with strings satisfactorily is quite difficult. I suggest you either sit down and design the framework from scratch, ignoring iostreams (which then become an implementation detail), or (more realistically) use an existing library, or at least an existing format, such as XML.

完美的未来在梦里 2024-09-02 04:54:13

基本上,您必须创建一种文件格式。当您仅限于内置函数、字符串及其序列时,您可以使用空格作为分隔符,编写用 " 包裹的字符串(转义任何 " - 然后\ 也 - 发生在流本身内),并选择任何不用于流内置类型的内容作为序列分隔符。存储序列的大小也可能会有所帮助。

例如,

5 1.4 "a string contains \" and \\" { 3 "blah" "blubb" "frgl" } { 2 42 21 }

可能是 int< 的序列化/code> (5)、一个 float (1.4)、一个字符串 ("包含 " 和 \" 的字符串),由 3 个字符串组成的序列("blah""blubb""frgl"),以及由 2 个字符串组成的序列int4221)。

或者,您可以按照尼尔的建议在他的评论中进行操作,并将字符串视为字符序列:

{ 27 ' a' ' ' s' 't' 'r' 'i' 'n' 'g' ' 'c' 'o' 'n' 't' 'a' 'i' 'n' 'i' 'n ' 'g' ' '""' ' 'a' 'n' 'd' ' '\' }

Basically, you will have to create a file format. When you're restricted to built-ins, strings, and sequences of those, you could use whitespace as delimiters, write strings wrapped in " (escaping any " - and then \, too - occurring within the streams themselves), and pick anything that isn't used for streaming built-in types as sequence delimiter. It might be helpful to store the size of a sequence, too.

For example,

5 1.4 "a string containing \" and \\" { 3 "blah" "blubb" "frgl" } { 2 42 21 }

might be the serialization of an int (5), a float (1.4), a string ("a string containing " and \"), a sequence of 3 strings ("blah", "blubb", and "frgl"), and a sequence of 2 ints (42 and 21).

Alternatively you could do as Neil suggests in his comment and treat strings as sequences of characters:

{ 27 'a' ' ' 's' 't' 'r' 'i' 'n' 'g' ' ' 'c' 'o' 'n' 't' 'a' 'i' 'n' 'i' 'n' 'g' ' ' '"' ' ' 'a' 'n' 'd' ' ' '\' }

仅一夜美梦 2024-09-02 04:54:13

如果你想避免转义字符串,你可以看看 ASN.1 是如何做的。对于您所声明的要求来说,这有点过分了:字符串、基本类型和这些东西的数组,但原则是流包含明确的长度信息。因此,没有什么需要逃避的。

对于一个非常简单的等价物,您可以将 uint32_t 输出为“ui4”,后跟 4 个字节的数据,将 int8_t 输出为“si1”,后跟 1 个字节的数据, IEEE float 为“f4”,IEEE double 为“f8”,依此类推。对数组使用一些附加修饰符:“a134ui4”后跟 536 字节的数据。请注意,任意长度需要终止,而有界长度(例如以下整数中的字节数)可以是固定大小(ASN.1 超出您需要的原因之一是它对所有内容都使用任意长度)。字符串可以是 aui1 或一些缩写,例如 s:。读者确实很简单。

这有明显的缺点:类型的大小和表示必须独立于平台,并且输出既不是人类可读的也不是特别压缩的。

你可以让它大部分是人类可读的,尽管使用 ASCII 而不是算术类型的二进制表示(小心数组:你可能想在输出任何数组之前计算整个数组的长度,或者你可以使用分隔符和终止符因为不需要字符转义),并且可以选择添加一个大的人类可见的分隔符,解串器会忽略它。例如,s16:hello, worlds12:||s12:hello, worlds16:hello, worlds12:s12:hello, world 更容易阅读。只是在阅读时要注意,看起来像分隔符序列的实际上可能不是分隔符序列,并且您必须避免陷入陷阱,例如假设代码中间的 s5:hello|| 意味着有一个字符串 5 chars long:它可能是 s15:hello||s5:hello|| 的一部分。

除非您对代码大小有非常严格的限制,否则使用现成的通用序列化器可能比编写专用序列化器更容易。使用 SAX 读取简单的 XML 并不困难。也就是说,每个人和他的狗都写了“最后,序列化器/解析器/任何可以让我们再次手工编码序列化器/解析器/任何东西的东西”,或多或少的成功。

If you want to avoid escaping strings, you can look at how ASN.1 does things. It's overkill for your stated requirements: strings, fundamental types and arrays of these things, but the principle is that the stream contains unambiguous length information. Therefore nothing needs to be escaped.

For a very simple equivalent, you could output a uint32_t as "ui4" followed by 4 bytes of data, a int8_t as "si1" followed by 1 byte of data, an IEEE float as "f4", IEEE double as "f8", and so on. Use some additional modifier for arrays: "a134ui4" followed by 536 bytes of data. Note that arbitrary lengths need to be terminated, whereas bounded lengths like the number of bytes in the following integer can be fixed size (one of the reasons ASN.1 is more than you need is that it uses arbitrary lengths for everything). A string could then either be a<len>ui1 or some abbreviation like s<len>:. The reader is very simple indeed.

This has obvious drawbacks: the size and representation of types must be independent of platform, and the output is neither human readable nor particularly compressed.

You can make it mostly human-readable, though with ASCII instead of binary representation of arithmetic types (careful with arrays: you may want to calculate the length of the whole array before outputting any of it, or you may use a separator and a terminator since there's no need for character escapes), and by optionally adding a big fat human-visible separator, that the deserializer ignores. For example, s16:hello, worlds12:||s12:hello, world is considerably easier to read than s16:hello, worlds12:s12:hello, world. Just beware when reading that what looks like a separator sequence might not actually be one, and you have to avoid falling into traps like assuming s5:hello|| in the middle of the code means there's a string 5 chars long: it might be part of s15:hello||s5:hello||.

Unless you have very tight constraints on code size, it's probably easier to use a general-purpose serializer off the shelf than it is to write a specialized one. Reading simple XML with SAX isn't difficult. That said, everyone and his dog has written "finally, the serializer/parser/whatever that will save us ever hand-coding a serializer/parser/whatever ever again", with greater or lesser success.

飘逸的'云 2024-09-02 04:54:13

You may consider using boost::spirit, which simplifies parsing of basic types from arbitrary input streams.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文