有符号整数的平台独立存储

发布于 2024-12-12 08:01:06 字数 638 浏览 2 评论 0原文

我想以独立于平台的方式将有符号整数值写入文件中。

如果它们没有签名,我只需使用 endian(3) 函数系列。

但我不确定如何处理有符号整数。如果我将它们转换为无符号值,我会丢失符号,因为 C 标准不保证

(int) ((unsigned) -1)) == -1

另一个选项是我将指针转换为该值(即,将字节序列重新解释为无符号),但我是不相信在那之后转换字节序会给出任何有意义的东西。

平台无关的有符号整数存储的正确方法是什么?

更新

  • 我知道实际上,几乎所有架构都使用二补码表示,以便我可以在有符号和无符号整数之间无损转换。然而,这个问题更偏理论化。

  • 仅仅推出我自己的整数表示形式(将十进制字母存储为 ascii 字符,或者单独存储符号位)当然是一个解决方案。但是,我很感兴趣是否有一种方法可以在不完全放弃本机二进制表示形式的情况下工作。

I want to write signed integer values into a file in a platform independent way.

If they were unsigned, I would just convert them from host byte order to LE (or BE) with the endian(3) family of functions.

I'm not sure how to deal with signed integers though. If I cast them to unsigned values, I loose the sign, since the C standard does not guarantee that

(int) ((unsigned) -1)) == -1

The other option would be to I cast a pointer to the value (i.e., reinterpret the byte sequence as unsigned), but it I'm not convinced that converting endianness after that is going to give anything sensible.

What is the proper way for platform independent signed integer storage?

Update:

  • I know that in practice, almost all architectures use two-complement representation, so that I can losslessly convert between signed and unsigned integers. However, this is question is meant to be more theoretical.

  • Just rolling out my own integer representation (be that storing the decimal letters as ascii characters, or separately storing the sign bit) is of course a solution. However, I'm interested if there is a way that works without completely abandoning the native binary representation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

源来凯始玺欢你 2024-12-19 08:01:06

最简单的解决方案:

对于写入,只需转换为无符号并使用无符号字节序转换函数即可。

为了读回这些值,首先将它们读入一个无符号变量,并检查是否设置了高位,并进行一些算术以使转换定义明确:

uint32_t temp;
int32_t dest;
if (temp > INT32_MAX) dest = -(int32_t)(-temp-1)-1;
else dest = temp;

作为额外的好处,在健全的系统上有一个好的编译器(即二进制补码系统,其中实现定义的无符号转换是“正确的”)将首先将 -(int32_t)(-temp-1)-1 优化为 (int32_t)temp,然后优化条件的两个分支(现在都包含相同的代码)到没有分支的单个代码路径。

The simplest solution:

For writing, just convert to unsigned and use your unsigned endian conversion functions.

For reading the values back, first read them into an unsigned variable, and check if the high bit is set, and do some arithmetic to make the conversion well-defined:

uint32_t temp;
int32_t dest;
if (temp > INT32_MAX) dest = -(int32_t)(-temp-1)-1;
else dest = temp;

As an added bonus, a good compiler on a sane system (i.e. a twos-complement system where the implementation-defined conversion to unsigned is "correct") will first optimize -(int32_t)(-temp-1)-1 to (int32_t)temp, then optimize the two branches of the conditional, which now both contain identical code, to a single code path with no branch.

漫雪独思 2024-12-19 08:01:06

一种独立于平台的方式?如果您真正想要这个,您应该考虑将其编写为文本而不是二进制(并考虑到即使那个也不是完全独立于平台的,因为您可能想要移动从 ASCII 到 EBCDIC 平台)。

这完全取决于您需要它如何独立于平台。 C 允许三种不同的带符号编码:补码、补码和符号/数值。但是,到目前为止,大多数机器都会使用第一个。

首先弄清楚该术语的实际含义。如果您的意思是只想处理二进制补码,那么将其转换为无符号就可以了。

A platform-independent way? If you truly want this, you should consider writing it as text rather than binary (and taking into account that even that is not fully platform-independent since you may want to move it from an ASCII to an EBCDIC platform).

It all depends on how platform-independent you need it to be. C allows for three different signed encodings: two's complement, one's complement and sign/magnitude. But, by far, most machines will use the first one.

Work out first what you actually mean by that term. If you mean you only want to handle two's complement, then casting it to an unsigned is fine.

坏尐絯 2024-12-19 08:01:06

使用与通过网络发送数据时相同的方法。将无符号或有符号值转换为大端并使用 htonl()< 保存它们/代码>。读取时,使用 ntohl()

但与往常一样,您需要知道数据最初是否已签名或未签名。仅凭一个位序列,您无法确定。

Use the same approach as when sending data over the network. Convert your unsigned or signed values to big-endian and save them by using htonl(). When reading, convert the data back to your machine endianness by using ntohl().

But as always you need to know if the data originally was signed or unsigned. With just a bit sequence, you can't know for sure.

_失温 2024-12-19 08:01:06

选项:

  • 使用类似 printf() 的函数将数字存储为纯文本进行转换
  • 将负数转换为符号 + 绝对值,将它们存储为带有额外符号位的无符号

Options:

  • Store numbers as plain text using printf()-like functions for conversion
  • Convert negative numbers to sign + absolute value, store them as unsigned with the extra sign bit
吃→可爱长大的 2024-12-19 08:01:06

输出一个 1 字节符号标志(例如 0=正,1=负)。如果该值为负数,则将其设为正数,然后以大端格式写入该值。如果您不喜欢 0 和 1,您可以使用“+”和“-”。

Output a 1 byte sign flag (e.g. 0=positive, 1=negative). If the value is negative make it positive and then write the value in big endian format. If you don't like 0 and 1 you could use '+' and '-'.

奢望 2024-12-19 08:01:06

将符号和绝对值存储为 2 个字段,并在读回时重新组合它们。

您说您已经知道如何与明确定义的字节顺序进行转换,因此剩下的就是确定符号(提示 < 0 可能会有所帮助:-)),取绝对值(您可以这样做结合确定它是什么,或者使用 abs() 或类似的

东西:

if (num < 0) {
  negative = 1;
   num      = -num;
 } else {
   negative = 0
 }
write_value = htole32(num);
write(file, &negative, 1);
write(file, &write_value, 4);

作为一种优化,您可以将值的符号位收集在一起并将它们存储在绝对值之前的单个字中。

Store the sign and the absolute value as 2 fields, and recombine them when you read it back.

You said you already know how to convert to/from a well-defined byte order, so all that is left is to determine the sign (hint < 0 might help here :-)), take the absolute value (which you could do in combination with determining what it is, or using abs() or similar.

Something like:

if (num < 0) {
  negative = 1;
   num      = -num;
 } else {
   negative = 0
 }
write_value = htole32(num);
write(file, &negative, 1);
write(file, &write_value, 4);

As an optimization you could collect the sign bits for values together and store them in a single word before the absolute values.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文