在 C 代码中写入和读取 long int 值

发布于 2024-07-26 02:45:58 字数 1007 浏览 2 评论 0 原文

我正在研究一种应该在几种不同的操作系统和计算机中写入和读取的文件格式。 其中一些计算机应该是 x86 计算机,其他计算机应该是 x86-64。 可能还存在一些其他处理器,但我还不关心它们。

此文件格式应包含几个数字,这些数字将如下所示读取:

struct LongAsChars{
    char c1, c2, c3, c4;
};

long readLong(FILE* file){
    int b1 = fgetc(file);
    int b2 = fgetc(file);
    int b3 = fgetc(file);
    int b4 = fgetc(file);
    if(b1<0||b2<0||b3<0||b4<0){
        //throwError
    }

    LongAsChars lng;
    lng.c1 = (char) b1;
    lng.c2 = (char) b2;
    lng.c3 = (char) b3;
    lng.c4 = (char) b4;

    long* value = (long*) &lng;

    return *value;
}

并写入为:

void writeLong(long x, FILE* f){
    long* xptr = &x;
    LongAsChars* lng = (LongAsChars*) xptr;
    fputc(lng->c1, f);
    fputc(lng->c2, f);
    fputc(lng->c3, f);
    fputc(lng->c4, f);
}

虽然这似乎在我的计算机上有效,但我担心它可能在其他计算机上无效,或者文件格式最终可能在不同计算机上有所不同(例如,32 位计算机与 64 位计算机)。 难道我做错了什么? 我应该如何实现我的代码以使用每个数字的恒定字节数?

我应该只使用 fread (这也可能使我的代码更快)吗?

I'm working on a file format that should be written and read in several different operating systems and computers. Some of those computers should be x86 machines, others x86-64. Some other processors may exist, but I'm not concerned about them yet.

This file format should contain several numbers that would be read like this:

struct LongAsChars{
    char c1, c2, c3, c4;
};

long readLong(FILE* file){
    int b1 = fgetc(file);
    int b2 = fgetc(file);
    int b3 = fgetc(file);
    int b4 = fgetc(file);
    if(b1<0||b2<0||b3<0||b4<0){
        //throwError
    }

    LongAsChars lng;
    lng.c1 = (char) b1;
    lng.c2 = (char) b2;
    lng.c3 = (char) b3;
    lng.c4 = (char) b4;

    long* value = (long*) &lng;

    return *value;
}

and written as:

void writeLong(long x, FILE* f){
    long* xptr = &x;
    LongAsChars* lng = (LongAsChars*) xptr;
    fputc(lng->c1, f);
    fputc(lng->c2, f);
    fputc(lng->c3, f);
    fputc(lng->c4, f);
}

Although this seems to be working on my computer, I'm concerned that it may not in others or that the file format may end up being different across computers(32 bits vs 64 bits computers, for example).
Am I doing something wrong? How should I implement my code to use a constant number of bytes per number?

Should I just use fread(which would possibly make my code faster too) instead?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

超可爱的懒熊 2024-08-02 02:45:58

使用 stdint.h 中的类型可确保输入和输出的字节数相同。

然后你就只剩下处理字节序问题了,而你的代码可能不会
真正处理。

使用别名 char* 序列化 long 会在写入文件中为具有不同字节序的平台留下不同的字节顺序。

您应该像这样分解字节:

char c1 = (val >>  0) & 0xff;
char c2 = (val >>  8) & 0xff;
char c3 = (val >> 16) & 0xff;
char c4 = (val >> 24) & 0xff;

然后使用类似的东西重新组合:

val = (c4 << 24) |
      (c3 << 16) |
      (c2 <<  8) |
      (c1 <<  0);

Use the types in stdint.h to ensure you get the same number of bytes in and out.

Then you're just left with dealing with endianness issues, which you code probably doesn't
really handle.

Serializing the long with an aliased char* leaves you with different byte orders in the written file for platforms with different endianess.

You should decompose the bytes something like so:

char c1 = (val >>  0) & 0xff;
char c2 = (val >>  8) & 0xff;
char c3 = (val >> 16) & 0xff;
char c4 = (val >> 24) & 0xff;

And recompose then using something like:

val = (c4 << 24) |
      (c3 << 16) |
      (c2 <<  8) |
      (c1 <<  0);
君勿笑 2024-08-02 02:45:58

您还可能会遇到字节序问题。 为什么不直接使用 NetCDFHDF,它负责处理可能出现的任何可移植性问题?

You might also run into issues with endianness. Why not just use something like NetCDF or HDF, which take care of any portability issues that may arise?

淡淡の花香 2024-08-02 02:45:58

与其使用包含字符的结构,不如考虑一种更数学的方法:

long l  = fgetc() << 24;
     l |= fgetc() << 16;
     l |= fgetc() <<  8;
     l |= fgetc() <<  0;

这对于您要实现的目标更加直接和清晰。 它也可以在循环中实现以处理更大的数字。

Rather than using structures with characters in them, consider a more mathematical approach:

long l  = fgetc() << 24;
     l |= fgetc() << 16;
     l |= fgetc() <<  8;
     l |= fgetc() <<  0;

This is a little more direct and clear about what you are trying to accomplish. It can also be implemented in a loop to handle larger numbers.

浪菊怪哟 2024-08-02 02:45:58

你不想使用long int。 这在不同的平台上可能有不同的大小,因此独立于平台的格式是行不通的。 您必须决定需要在文件中存储什么范围的值。 32 位可能是最简单的。

您说您还不担心其他平台。。 我认为这意味着您希望保留支持它们的可能性,在这种情况下您应该定义文件格式的字节顺序。 x86 是小端字节序,所以您可能认为这是最好的。 但大尾数法是“标准”交换顺序(如果有的话),因为它用于网络。

如果您选择大端(“网络字节顺序”):

// can't be bothered to support really crazy platforms: it is in
// any case difficult even to exchange files with 9-bit machines,
// so we'll cross that bridge if we come to it.
assert(CHAR_BIT == 8);
assert(sizeof(uint32_t) == 4);

{
    // write value
    uint32_t value = 23;
    const uint32_t networkOrderValue = htonl(value);
    fwrite(&networkOrderValue, sizeof(uint32_t), 1, file);
}

{
    // read value
    uint32_t networkOrderValue;
    fread(&networkOrderValue, sizeof(uint32_t), 1, file);
    uint32_t value = ntohl(networkOrderValue);
}

实际上,您甚至不需要声明两个变量,只是在同一变量中将“值”替换为其等效的网络顺序有点混乱。

它之所以有效,是因为“网络字节顺序”被定义为在内存中产生可互换(大端)顺序的任何位排列。 无需搞乱联合,因为 C 中的任何存储对象都可以视为 char 序列。 无需对字节序进行特殊处理,因为这就是 ntohl/htonl 的用途。

如果这太慢,您可以开始考虑使用 SIMD 或其他方式进行极端优化的特定于平台的字节交换。 或者使用小端,假设您的大多数平台都是小端,因此“平均”速度更快。 在这种情况下,您需要编写或查找“主机到小端”和“小端到主机”函数,当然,这些函数在 x86 上什么也不做。

You don't want to use long int. That can be different sizes on different platforms, so is a non-starter for a platform-independent format. You have to decide what range of values needs to be stored in the file. 32 bits is probably easiest.

You say you aren't worried about other platforms yet. I'll take that to mean you want to retain the possibility of supporting them, in which case you should define the byte-order of your file format. x86 is little-endian, so you might think that's the best. But big-endian is the "standard" interchange order if anything is, since it's used in networking.

If you go for big-endian ("network byte order"):

// can't be bothered to support really crazy platforms: it is in
// any case difficult even to exchange files with 9-bit machines,
// so we'll cross that bridge if we come to it.
assert(CHAR_BIT == 8);
assert(sizeof(uint32_t) == 4);

{
    // write value
    uint32_t value = 23;
    const uint32_t networkOrderValue = htonl(value);
    fwrite(&networkOrderValue, sizeof(uint32_t), 1, file);
}

{
    // read value
    uint32_t networkOrderValue;
    fread(&networkOrderValue, sizeof(uint32_t), 1, file);
    uint32_t value = ntohl(networkOrderValue);
}

Actually, you don't even need to declare two variables, it's just a bit confusing to replace "value" with its network order equivalent in the same variable.

It works because "network byte order" is defined to be whatever arrangement of bits results in an interchangeable (big-endian) order in memory. No need to mess with unions because any stored object in C can be treated as a sequence of char. No need to special-case for endianness because that's what ntohl/htonl are for.

If this is too slow, you can start thinking about fiendishly optimised platform-specific byte-swapping, with SIMD or whatever. Or using little-endian, on the assumption that most of your platforms will be little-endian and so it's faster "on average" across them. In that case you'll need to write or find "host to little-endian" and "little-endian to host" functions, which of course on x86 just do nothing.

小耗子 2024-08-02 02:45:58

我相信最跨体系结构的方法是使用 uintXX_t 类型,如 stdint.h 中所定义。 请参阅此处的手册页。 例如,int32_t 将为您提供 x86 上的 32 位整数和 x86-64。
我现在在所有代码中默认使用这些并且没有遇到任何问题,因为它们在所有 *NIX 中都是相当标准的。

I believe the most cross architecture approach is to use the uintXX_t types, as defined in stdint.h. See man page here. For example a int32_t will give you a 32 bit integer on x86 and x86-64.
I use these by default now in all of my code and have had no troubles, as they are fairly standard across all *NIX.

又怨 2024-08-02 02:45:58

假设 sizeof(uint32_t) == 4,有 4!=24 种可能的字节顺序,其中小端和大端是最突出的例子,但其他的也已被使用(例如PDP-endian)。

以下是从流中读取和写入 32 位无符号整数的函数,注意由表示为字节序列 0,1,2,3 的整数指定的任意字节顺序:endian.hendian.c

标头定义了这些原型

_Bool read_uint32(uint32_t * value, FILE * file, uint32_t order);
_Bool write_uint32(uint32_t value, FILE * file, uint32_t order);

和这些常量

LITTLE_ENDIAN
BIG_ENDIAN
PDP_ENDIAN
HOST_ORDER

Assuming sizeof(uint32_t) == 4, there are 4!=24 possible byte orders, of which little-endian and big-endian are the most prominent examples, but others have been used as well (e.g. PDP-endian).

Here are functions for reading and writing 32 bit unsigned integers from a stream, heeding an arbitrary byte order which is specified by the integer whose representation is the byte sequence 0,1,2,3: endian.h, endian.c

The header defines these prototypes

_Bool read_uint32(uint32_t * value, FILE * file, uint32_t order);
_Bool write_uint32(uint32_t value, FILE * file, uint32_t order);

and these constants

LITTLE_ENDIAN
BIG_ENDIAN
PDP_ENDIAN
HOST_ORDER
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文