从文件中跨平台(反)序列化 char*
我需要fwrite
一个char *
来归档,并在char
的符号不同的另一个平台上fread
它。
- 除了显式序列化
unsigned char*
之外,还有其他方法可以解决这个问题吗? - 将
char*
转换为unsigned char*
是否总是安全的?
I need to fwrite
a char *
to file and fread
it on another platform where char
's signedness varies.
- Are there ways to solve this other than explicitly serializing a
unsigned char*
? - Is it always safe to cast a
char*
to anunsigned char*
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这两个平台必须在某种程度上就
char
的表示达成一致,以便您将文件从一个平台传输到另一个平台。因此,没有“完全可移植”的方法来做到这一点 - 例如,假设
char
在写入平台上是 16 位,在读取平台上是 8 位,那么显然你不能一般传输字符从一个到另一个。要么根本不可能做到这一点(16 位字符建议使用 DSP,它可能没有基于文件或基于流的 I/O),要么存在一些商定的规则如何在传输文件时转换文件。还需要就执行字符集是什么达成一致,或者在(例如)EBCDIC 和 ASCII 之间转换文件的方法。否则,在一侧写入
a
不会导致在另一侧读取a
。一旦您建立了
char
每一侧对应的紧密程度的规则,就会告诉您可以读和写什么。如果唯一区别是char
的符号不同,但它们都使用相同的字符集,那么只需检查带符号的符号如何表示负值即可。假设它以唯一常见的方式(二进制补码)执行此操作,并且假设双方都以唯一常见的方式从无符号整数转换为有符号整数(重新解释位模式),那么您可以只读取和写入
char
通常在两侧,与在unsigned char
和signed char
之间进行转换的结果实际上相同。Those two platforms must agree to some extent on the representation of
char
, in order for you to have transferred the file from one to the other.So there's no "completely portable" way to do this - for example suppose
char
is 16 bits on the platform that writes, and 8 bits on the platform that reads, then clearly you can't in general transfer chars from one to the other. Either it's impossible to do it at all (16 bit char suggest a DSP, it might have no file- or stream-based I/O), or there's some agreed rule how to convert the file when transferring it.There also needs to be either agreement what the execution character set is, or else a means of converting the file between (for example) EBCDIC and ASCII. Otherwise writing an
a
on one side won't result in reading ana
on the other.Once you've established the rules for how closely
char
corresponds on each side, that tells you what you can read and write. If the only difference is that the signedness ofchar
varies, but they both use the same character set, then just check how the one that's signed represents negative values.Assuming it does so in the only common way (two's complement), and supposing that both sides convert from unsigned to signed integers in the only common way (re-interpret the bit pattern), then you can just read and write
char
normally on both sides with effectively the same results as casting betweenunsigned char
andsigned char
.在 C 中,可以安全地以
unsigned char [sizeof T]
形式访问任何类型;这称为表示。问题是在不同系统之间复制这种表示是否会保留值。以下是相关事实/问题:char
的所有正值(请记住,基本执行字符集中的所有字符都必须为正)与unsigned char< /code> 具有相同的值。 (这同样适用于其他有符号/无符号整数类型。)
char
类型完全兼容(以值解释方式的差异为模),并且它是完美的以任何一种类型都可以安全地访问它们。此外,C 标准使得生成一个有效的实现(其中纯char
被签名而不是二进制补码)变得很困难(如果不是不可能的话),而且我认为可以肯定地说不存在或永远不会存在这样的实现。char
的值(这些是整数!),这并不一定意味着将保留字符身份,因为目标系统可能使用不同的字符编码(EBCDIC puke..)。这是很多胡言乱语,但你应该明白的结果是,除非你的目标是迂腐和语言律师,否则没有什么可担心的。只需直接在字符串上使用
fwrite
和fread
即可,不用担心它们是unsigned char[]
还是char[]< /代码> 字符串。
In C, it's safe to access any type as
unsigned char [sizeof T]
; this is called the representation. The question is whether copying this representation between diverse systems will preserve the value. Here are the relevant facts/issues:char
(and keep in mind, all characters in the basic execution character set must be positive) have the same representation as anunsigned char
with the same value. (The same applies to other signed/unsigned integer types too.)char
types are completely compatible (modulo the difference in how the values are interpreted) and it's perfectly safe to access them as either type. Moreover, the C standard makes it difficult if not impossible to produce a valid implementation where plainchar
is signed and not twos-complement, and I think it's safe to say no such implementation exists or will ever exist.char
(these are integers!) are preserved when transporting the file to another system, that doesn't necessarily imply that the character identities will be preserved, since the target system might use a different character encoding (EBCDIC puke..).This is a lot of mumbo-jumbo, but the result you should take away is that unless your goal is pedantry and language-lawyering, there's nothing to worry about. Just use
fwrite
andfread
directly on strings and don't worry about whether they wereunsigned char[]
orchar[]
strings.如果您“序列化”没有负数的字符,那么这并不重要。否则它没有意义(因为您将无法确定写入了哪个值)。
If you are 'serializing' char's without negative ones, then it doesn't matter. Otherwise it doesn't make sense (since you won't be able to determine which value was written).