在读/写二进制数据结构时访问位域

发布于 2024-12-01 17:50:50 字数 552 浏览 2 评论 0原文

我正在为二进制格式编写一个解析器。这种二进制格式涉及不同的表，这些表同样采用二进制格式，通常包含不同的字段大小（其中 50 - 100 个）。

大多数这些结构都有位域，并且在用 C 表示时看起来像这样：

struct myHeader
{
  unsigned char fieldA : 3
  unsigned char fieldB : 2;
  unsigned char fieldC : 3;
  unsigned short fieldD : 14;
  unsigned char fieldE : 4
}

我遇到了 struct 模块，但意识到它的最低分辨率是一个字节而不是一个位，否则该模块非常适合这项工作。

我知道使用 ctypes 支持位字段，但我不确定如何在此处连接包含位字段的 ctypes 结构。

我的另一个选择是自己操作这些位并将其输入字节并与 struct 模块一起使用 - 但由于我有接近 50-100 种不同类型的此类结构，因此编写代码变得更容易出错。我还担心效率，因为该工具可能用于解析大千兆字节的二进制数据。

谢谢。

原文

I'm writing a parser for a binary format. This binary format involves different tables which are again in binary format containing varying field sizes usually (somewhere between 50 - 100 of them).

Most of these structures will have bitfields and will look something like these when represented in C:

struct myHeader
{
  unsigned char fieldA : 3
  unsigned char fieldB : 2;
  unsigned char fieldC : 3;
  unsigned short fieldD : 14;
  unsigned char fieldE : 4
}

I came across the struct module but realized that its lowest resolution was a byte and not a bit, otherwise the module pretty much was the right fit for this work.

I know bitfields are supported using ctypes, but I'm not sure how to interface ctypes structs containing bitfields here.

My other option is to manipulate the bits myself and feed it into bytes and use it with the struct module - but since I have close to 50-100 different types of such structures, writing the code for that becomes more error-prone. I'm also worried about efficiency since this tool might be used to parse large gigabytes of binary data.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冰雪梦之恋 2024-12-08 17:50:50

使用 bitstring （您提到您正在查看）应该很容易足以实施。首先创建一些要解码的数据：

>>> myheader = "3, 2, 3, 14, 4"
>>> a = bitstring.pack(myheader, 1, 0, 5, 1000, 2)
>>> a.bin
'00100101000011111010000010'
>>> a.tobytes()
'%\x0f\xa0\x80'

然后再次解码它只是

>>> a.readlist(myheader)
[1, 0, 5, 1000, 2]

您主要关心的问题可能是速度。该库是经过良好优化的 Python 库，但速度远不及 C 库。

Using bitstring (which you mention you're looking at) it should be easy enough to implement. First to create some data to decode:

>>> myheader = "3, 2, 3, 14, 4"
>>> a = bitstring.pack(myheader, 1, 0, 5, 1000, 2)
>>> a.bin
'00100101000011111010000010'
>>> a.tobytes()
'%\x0f\xa0\x80'

And then decoding it again is just

>>> a.readlist(myheader)
[1, 0, 5, 1000, 2]

Your main concern might well be the speed. The library is well optimised Python, but that's not nearly as fast as a C library would be.

回复收藏 0 原文

柏拉图鍀咏恒 2024-12-08 17:50:50

我还没有严格测试这一点，但它似乎适用于无符号类型（编辑：它也适用于有符号字节/短类型）。

编辑2：这真的是碰运气。这取决于库的编译器将位打包到结构中的方式，这不是标准化的。例如，对于 gcc 4.5.3，只要我不使用属性来打包结构，即 __attribute__ ((__packed__)) （因此它不是 6 个字节，而是打包成 4 个字节），它就可以工作。字节，您可以使用 __alignof__ 和 sizeof 检查）。我可以通过将 _pack_ = True 添加到 ctypes 结构定义来使其几乎可以工作，但对于 fieldE 却失败了。 gcc 注释：“打包位域‘fieldE’的偏移量在 GCC 4.4 中已更改”。

import ctypes

class MyHeader(ctypes.Structure):
    _fields_ = [
        ('fieldA', ctypes.c_ubyte, 3),
        ('fieldB', ctypes.c_ubyte, 2),
        ('fieldC', ctypes.c_ubyte, 3),
        ('fieldD', ctypes.c_ushort, 14),
        ('fieldE', ctypes.c_ubyte, 4),
    ]

lib = ctypes.cdll.LoadLibrary('C/bitfield.dll')

hdr = MyHeader()
lib.set_header(ctypes.byref(hdr))

for x in hdr._fields_:
    print("%s: %d" % (x[0], getattr(hdr, x[0])))

输出：

fieldA: 3
fieldB: 1
fieldC: 5
fieldD: 12345
fieldE: 9

C：

typedef struct _MyHeader {
    unsigned char  fieldA  :  3;
    unsigned char  fieldB  :  2;
    unsigned char  fieldC  :  3;
    unsigned short fieldD  : 14;
    unsigned char  fieldE  :  4;
} MyHeader, *pMyHeader; 

int set_header(pMyHeader hdr) {

    hdr->fieldA = 3;
    hdr->fieldB = 1;
    hdr->fieldC = 5;
    hdr->fieldD = 12345;
    hdr->fieldE = 9;

    return(0);
}

I haven't rigorously tested this, but it seems to work with unsigned types (edit: it works with signed byte/short types, too).

Edit 2: This is really hit or miss. It depends on the way the library's compiler packed the bits into the struct, which is not standardized. For example, with gcc 4.5.3 it works as long as I don't use the attribute to pack the struct, i.e. __attribute__ ((__packed__)) (so instead of 6 bytes it gets packed into 4 bytes, which you can check with __alignof__ and sizeof). I can make it almost work by adding _pack_ = True to the ctypes Structure definition, but it fails for fieldE. gcc notes: "Offset of packed bit-field ‘fieldE’ has changed in GCC 4.4".

import ctypes

class MyHeader(ctypes.Structure):
    _fields_ = [
        ('fieldA', ctypes.c_ubyte, 3),
        ('fieldB', ctypes.c_ubyte, 2),
        ('fieldC', ctypes.c_ubyte, 3),
        ('fieldD', ctypes.c_ushort, 14),
        ('fieldE', ctypes.c_ubyte, 4),
    ]

lib = ctypes.cdll.LoadLibrary('C/bitfield.dll')

hdr = MyHeader()
lib.set_header(ctypes.byref(hdr))

for x in hdr._fields_:
    print("%s: %d" % (x[0], getattr(hdr, x[0])))

Output:

fieldA: 3
fieldB: 1
fieldC: 5
fieldD: 12345
fieldE: 9

typedef struct _MyHeader {
    unsigned char  fieldA  :  3;
    unsigned char  fieldB  :  2;
    unsigned char  fieldC  :  3;
    unsigned short fieldD  : 14;
    unsigned char  fieldE  :  4;
} MyHeader, *pMyHeader; 

int set_header(pMyHeader hdr) {

    hdr->fieldA = 3;
    hdr->fieldB = 1;
    hdr->fieldC = 5;
    hdr->fieldD = 12345;
    hdr->fieldE = 9;

    return(0);
}

回复收藏 0 原文

~没有更多了~