从文件中跨平台(反)序列化 char*

发布于 2024-12-01 02:53:29 字数 253 浏览 1 评论 0原文

我需要fwrite一个char *来归档,并在char的符号不同的另一个平台上fread它。

  • 除了显式序列化 unsigned char* 之外,还有其他方法可以解决这个问题吗?
  • char* 转换为 unsigned char* 是否总是安全的?

I need to fwritea char * to file and fread it on another platform where char's signedness varies.

  • Are there ways to solve this other than explicitly serializing a unsigned char*?
  • Is it always safe to cast a char* to an unsigned char*?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

带上头具痛哭 2024-12-08 02:53:29

这两个平台必须在某种程度上就 char 的表示达成一致,以便您将文件从一个平台传输到另一个平台。

因此,没有“完全可移植”的方法来做到这一点 - 例如,假设 char 在写入平台上是 16 位,在读取平台上是 8 位,那么显然你不能一般传输字符从一个到另一个。要么根本不可能做到这一点(16 位字符建议使用 DSP,它可能没有基于文件或基于流的 I/O),要么存在一些商定的规则如何在传输文件时转换文件。

还需要就执行字符集是什么达成一致,或者在(例如)EBCDIC 和 ASCII 之间转换文件的方法。否则,在一侧写入 a 不会导致在另一侧读取 a

一旦您建立了 char 每一侧对应的紧密程度的规则,就会告诉您可以读和写什么。如果唯一区别是char的符号不同,但它们都使用相同的字符集,那么只需检查带符号的符号如何表示负值即可。

假设它以唯一常见的方式(二进制补码)执行此操作,并且假设双方都以唯一常见的方式从无符号整数转换为有符号整数(重新解释位模式),那么您可以只读取和写入 char 通常在两侧,与在 unsigned charsigned char 之间进行转换的结果实际上相同。

Those two platforms must agree to some extent on the representation of char, in order for you to have transferred the file from one to the other.

So there's no "completely portable" way to do this - for example suppose char is 16 bits on the platform that writes, and 8 bits on the platform that reads, then clearly you can't in general transfer chars from one to the other. Either it's impossible to do it at all (16 bit char suggest a DSP, it might have no file- or stream-based I/O), or there's some agreed rule how to convert the file when transferring it.

There also needs to be either agreement what the execution character set is, or else a means of converting the file between (for example) EBCDIC and ASCII. Otherwise writing an a on one side won't result in reading an a on the other.

Once you've established the rules for how closely char corresponds on each side, that tells you what you can read and write. If the only difference is that the signedness of char varies, but they both use the same character set, then just check how the one that's signed represents negative values.

Assuming it does so in the only common way (two's complement), and supposing that both sides convert from unsigned to signed integers in the only common way (re-interpret the bit pattern), then you can just read and write char normally on both sides with effectively the same results as casting between unsigned char and signed char.

心不设防 2024-12-08 02:53:29

在 C 中,可以安全地以 unsigned char [sizeof T] 形式访问任何类型;这称为表示。问题是在不同系统之间复制这种表示是否会保留。以下是相关事实/问题:

  • char 的所有正值(请记住,基本执行字符集中的所有字符都必须为正)与 unsigned char< /code> 具有相同的值。 (这同样适用于其他有符号/无符号整数类型。)
  • 在二进制补码系统上,有符号和无符号 char 类型完全兼容(以值解释方式的差异为模),并且它是完美的以任何一种类型都可以安全地访问它们。此外,C 标准使得生成一个有效的实现(其中纯 char 被签名而不是二进制补码)变得很困难(如果不是不可能的话),而且我认为可以肯定地说不存在或永远不会存在这样的实现。
  • 即使在将文件传输到另一个系统时保留了作为 char(这些是整数!),这并不一定意味着将保留字符身份,因为目标系统可能使用不同的字符编码(EBCDIC puke..)。

这是很多胡言乱语,但你应该明白的结果是,除非你的目标是迂腐和语言律师,否则没有什么可担心的。只需直接在字符串上使用 fwritefread 即可,不用担心它们是 unsigned char[] 还是 char[]< /代码> 字符串。

In C, it's safe to access any type as unsigned char [sizeof T]; this is called the representation. The question is whether copying this representation between diverse systems will preserve the value. Here are the relevant facts/issues:

  • All positive values of char (and keep in mind, all characters in the basic execution character set must be positive) have the same representation as an unsigned char with the same value. (The same applies to other signed/unsigned integer types too.)
  • On a twos-complement system, signed and unsigned char types are completely compatible (modulo the difference in how the values are interpreted) and it's perfectly safe to access them as either type. Moreover, the C standard makes it difficult if not impossible to produce a valid implementation where plain char is signed and not twos-complement, and I think it's safe to say no such implementation exists or will ever exist.
  • Even if the values as char (these are integers!) are preserved when transporting the file to another system, that doesn't necessarily imply that the character identities will be preserved, since the target system might use a different character encoding (EBCDIC puke..).

This is a lot of mumbo-jumbo, but the result you should take away is that unless your goal is pedantry and language-lawyering, there's nothing to worry about. Just use fwrite and fread directly on strings and don't worry about whether they were unsigned char[] or char[] strings.

暖伴 2024-12-08 02:53:29

如果您“序列化”没有负数的字符,那么这并不重要。否则它没有意义(因为您将无法确定写入了哪个值)。

If you are 'serializing' char's without negative ones, then it doesn't matter. Otherwise it doesn't make sense (since you won't be able to determine which value was written).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文