从 char 数组中转换/提取 int
我得到了一个 cstring,源自 gzread 的调用。我知道数据是块,每个块由 unsigned int、char、int 和 unsigned Short int 组成。
所以我想知道将此 cstring 拆分为适当变量的标准方法是什么。
假设前 4 个字节是 unsigned int,下一个字节是 char,接下来的 4 个字节是有符号 int,最后 2 个字节是 unsigned Short int。
//Some pseudocode below which would work
char buf[11];
unsigned int a;
char b;
int c;
unsigned short int d;
我想我可以使用适当的偏移量进行memcpy。
memcpy(&a, buf, sizeof(unsigned int));
memcpy(&b, buf+4, sizeof(char));
memcpy(&c, buf+5, sizeof(int));
memcpy(&d, buf+9, sizeof(unsigned short int));
或者使用一些位运算符更好?就像移动和掩蔽一样。
或者将所有 11 个字节直接 gzreading 到某个结构中会更好吗?或者这甚至可能吗?结构体的内存布局是固定的吗?这可以与 gzread 一起使用吗?
I got a cstring, originating from a call from gzread. I know the data is blocks, and each block is consisting of an unsigned int, char, int and unsigned short int.
So I was wondering what the standard way of splitting this cstring into the appropriate variables is.
Say the first 4 bytes, is a unsigned int, the next byte is char, the next 4 bytes is signed int, and the last 2 bytes are unsigned short int.
//Some pseudocode below which would work
char buf[11];
unsigned int a;
char b;
int c;
unsigned short int d;
I guess I could memcpy, with appropriate offsets.
memcpy(&a, buf, sizeof(unsigned int));
memcpy(&b, buf+4, sizeof(char));
memcpy(&c, buf+5, sizeof(int));
memcpy(&d, buf+9, sizeof(unsigned short int));
Or is it better to use some bitoperators? Like shifting and masking.
Or would it be better to gzreading all 11 bytes directly into some struct, or is that even possible? Is the memory layout of a struct fixed, and will this work with gzread?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您打包结构(阅读
__packed__
属性),您可以依赖顺序并且成员不对齐。因此,您可以直接读入结构。但是,我不确定该解决方案的可移植性。否则,请像这样使用指针魔术和转换:
If you pack the struct (read up on
__packed__
attribute), you can rely on the order and that the members are non-aligned. Hence, you could read into a struct directly. However, I'm not sure about the portability of this solution.Otherwise, use pointer magic and casting like so:
您需要确保文件的字节顺序与您运行代码的处理器架构相匹配。例如,如果整数以最高有效字节在前的方式写入文件,而您的处理器则使用最低有效字节在前的顺序,那么您将得到垃圾结果。
如果您想让代码从一种体系结构移植到另一种体系结构,则应将整数的所有读写操作包装在宏或内联函数后面,这些宏或内联函数根据目标处理器体系结构为您管理字节顺序。
You need to make sure that the byte order of the file matches the processor architecture you're running your code on. If, for instance, the integers are written to file with most significant byte first and your processor uses least significant byte first order, you're getting garbage for results.
If you want to make your code portable from one architecture to another, you should wrap all read and write operations for integers behind macros or inline functions that manage the byte order for you depending on the target processor architecture.
这取决于输入数据的定义方式。如果它被定义为主机端顺序(即,端序始终与运行代码的系统匹配),那么您所展示的
memcpy()
是一个很好的、可移植的方法使用。或者,如果输入数据被定义为具有特定的字节顺序,那么最好的可移植解决方案是使用移位和按位或一次加载一个
unsigned char
。It depends on how the input data is defined. If it's defined to be in host-endian order (that is, the endianness always matches the system on which your code is running), then the
memcpy()
you have shown is a good, portable method to use.Alternatively, if the input data is defined to have a particular endianness, then the best portable solution is to load it one
unsigned char
at a time, using shifts and bitwise-or.在执行任何操作之前,您需要先了解格式规范。是
它是文本或二进制(根据您的描述大概是二进制,但是一个
永远不知道)?有符号值的表示形式是什么?什么
是字节顺序?
memcpy
仅在您的机器架构下才有效与输入格式完全对应——如今这种情况很少见,
因为几乎所有网络格式都是大端格式,并且是最广泛使用的
架构是小端的。 (当今大多数格式和架构
使用 2 的补码来表示负值,因此您通常可以“假设”
那里有兼容性。但也有例外。)
鉴于此,值的数学重建(使用掩码和
移位或乘法)是唯一可移植的解决方案。取决于
在机器和编译器的质量上,很容易导致
也有更好的表现。
You need a specification of the format before you can do anything. Is
it text or binary (presumably binary from your description, but one
never knows)? What is the representation used for signed values? What
is the byte order?
memcpy
will only work if your machine architecturecorresponds exactly to that of the input format—a rare case today,
since almost all network formats are big-endian, and the most widespread
architectures are little-endian. (Most formats and architectures today
use 2's complement for negative values, so you can often "assume"
compatiblity there. But there are exceptions.)
Given this, mathematical reconstruction of the value (using masking and
shifting, or multiplications) is the only portable solution. Depending
on the machine and the quality of the compiler, it could easily result
in better performance as well.