在Python中实现C的枚举和联合

发布于 2024-09-24 23:11:05 字数 1235 浏览 2 评论 0原文

我正在尝试找出一些 C 代码,以便将其移植到 python 中。该代码用于读取专有的二进制数据文件格式。到目前为止,它很简单——主要是结构体,我一直在使用 struct 库从文件中请求特定的 ctypes。然而,我刚刚想到了这段代码,但我不知道如何在 python 中实现它。特别是,我不确定如何处理enumunion

#define BYTE char 
#define UBYTE unsigned char 
#define WORD short 
#define UWORD unsigned short

typedef enum {
    TEEG_EVENT_TAB1=1, 
    TEEG_EVENT_TAB2=2
} TEEG_TYPE;

typedef struct
{
        TEEG_TYPE Teeg;
        long Size;
    union

        {
            void *Ptr;  // Memory pointer
            long Offset
        };
} TEEG;

其次,在下面的结构定义中,我不确定变量名称后面的冒号是什么意思(例如,KeyPad:4)。这是否意味着我应该读取 4 个字节?

typedef struct
{
    UWORD StimType;
    UBYTE KeyBoard;
    UBYTE KeyPad:4;
    UBYTE Accept:4;
    long Offset;
} EVENT1;

如果它有用的话,我在 python 中访问文件的方式的抽象示例如下:

from struct import unpack, calcsize

def get(ctype, size=1):
    """Reads and unpacks binary data into the desired ctype."""
    if size == 1:
        size = ''
    else:
        size = str(size)

    chunk = file.read(calcsize(size + ctype))
    return unpack(size + ctype, chunk)[0]

file = open("file.bin", "rb")
file.seek(1234)

var1 = get('i')
var2 = get('4l')
var3 = get('10s')

I'm trying to figure out some C code so that I can port it into python. The code is for reading a proprietary binary data file format. It has been straightforward thus far -- it's mainly been structs and I have been using the struct library to ask for particular ctypes from the file. However, I just came up on this bit of code and I'm at a loss for how to implement it in python. In particular, I'm not sure how to deal with the enum or the union.

#define BYTE char 
#define UBYTE unsigned char 
#define WORD short 
#define UWORD unsigned short

typedef enum {
    TEEG_EVENT_TAB1=1, 
    TEEG_EVENT_TAB2=2
} TEEG_TYPE;

typedef struct
{
        TEEG_TYPE Teeg;
        long Size;
    union

        {
            void *Ptr;  // Memory pointer
            long Offset
        };
} TEEG;

Secondly, in the below struct definition, I'm not sure what the colons after the variable names mean, (e.g., KeyPad:4). Does it mean I'm supposed to read 4 bytes?

typedef struct
{
    UWORD StimType;
    UBYTE KeyBoard;
    UBYTE KeyPad:4;
    UBYTE Accept:4;
    long Offset;
} EVENT1;

In case it's useful, an abstract example of the way I've been accessing the file in python is as follows:

from struct import unpack, calcsize

def get(ctype, size=1):
    """Reads and unpacks binary data into the desired ctype."""
    if size == 1:
        size = ''
    else:
        size = str(size)

    chunk = file.read(calcsize(size + ctype))
    return unpack(size + ctype, chunk)[0]

file = open("file.bin", "rb")
file.seek(1234)

var1 = get('i')
var2 = get('4l')
var3 = get('10s')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

翻了热茶 2024-10-01 23:11:05

枚举:该语言中没有枚举。人们提出了各种习语,但没有一个真正广泛传播。最直接(在这种情况下足够)的解决方案是

TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

联合: ctypes 有 < a href="http://docs.python.org/py3k/library/ctypes.html#structurals-and-unions" rel="noreferrer">联合。

fieldname : n 语法称为位域,是的,确实意味着“这是 n 位大”。同样,ctypes 有它们

Enums: There are no enums in the language. Various idioms have been proposed, but none is really widespread. The most straightforward (and in this case sufficient) solution is

TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

Unions: ctypes has unions.

The fieldname : n syntax is called a bitfield and, yeah, does mean "this is n bits big". Again, ctypes has them.

请你别敷衍 2024-10-01 23:11:05

我不知道你所有问题的答案,但对于不需要按值查找的枚举(只是用它来避免幻数),我喜欢使用一个小类。常规字典是另一种效果很好的选择。如果您需要按值查找,则可能需要另一个结构。

class TeegType(object):
    TEEG_EVENT_TAB1 = 1
    TEEG_EVENT_TAB2 = 2

print TeegType.TEEG_EVENT_TAB1

I don't know the answer to all of your question, but for enums that you do not need a lookup-by-value on, (is, just using it to avoid magic numbers), I like to use a small class. A regular dict is another option that works fine. If you need lookup-by-value, you may want another structure though.

class TeegType(object):
    TEEG_EVENT_TAB1 = 1
    TEEG_EVENT_TAB2 = 2

print TeegType.TEEG_EVENT_TAB1
心房的律动 2024-10-01 23:11:05

您真正需要知道的是:

  1. 枚举?。您将使用此答案来生成解包代码。
  2. 联合的大小是多少?。摘要:最大成员的大小。
  3. 您如何处理该指针?您应该看一下 ctypes 模块。对于您正在做的事情,它可能比 struct 模块更容易使用。特别是,它可以与通过 C 到达的指针一起使用。
  4. 如何将从结构中读取的数据强制/转换为正确的类型以在 python 中使用?这就是为什么我在上面的项目符号中推荐ctypes;该模块具有执行必要的转换的功能。

What you really need to know is:

  1. What is the size of an enum?. You will use this answer to generate your unpacking code.
  2. What is the size of a union?. Summary: the size of the largest member.
  3. How do you deal with that pointer? You should take a look at the ctypes module. For what you are doing, it may be easier to work with than the struct module. In particular, it can work with pointers arriving via C.
  4. How do you coerce/cast the data read from the struct into the right type to work with in python? This is why I recommended ctypes in the bullet above; this module has functions for performing the necessary casts.
愛放△進行李 2024-10-01 23:11:05

C enum 声明是某种整数类型的语法包装器。请参阅sizeof(enum) == sizeof(int) 总是吗?int 有多大取决于特定的 C 编译器。我可能会从尝试 16 位开始。

union 保留一块内存块,其大小为所包含的最大数据类型。同样,确切的大小将取决于 C 实现,但我希望 32 位架构为 32 位,如果编译为本机 64 位代码,则为 64 位。一般来说,您可以将 union 的内容存储在 Python 整数或长整型中,无论其中保存的是指针还是偏移量。

一个更有趣的问题是为什么指针会被写入磁盘文件。您可能会发现,当 TEEG struct 位于内存中时,union 字段仅被视为指针,但当写入磁盘时,它是始终是整数偏移量。

至于 :4 表示法,正如一些人所指出的,它们是“位字段”,意思是一系列位,其中几个可以打包到一个空间中。如果我没记错的话,C 中的位字段被打包到 int 中,因此这两个 4 位字段都将被打包到一个整数中。可以通过适当使用 Python 的“&”来解压它们。 (按位与)和“>>” (右移)运算符。同样,字段究竟如何打包到整数中,以及整数字段本身的大小将取决于特定的 C 实现。

也许下面的代码片段会对您有所帮助:

SIZEOF_TEEG_TYPE = 2      # First guess for enum is two bytes
FMT_TEEG_TYPE = "h"       # Could be "b", "B", "h", "H", "l", "L", "q" or "Q"

SIZEOF_LONG = 4           # Use 8 in 64-bit Unix architectures
FMT_LONG = "l"            # Use "q" in 64-bit Unix architectures
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_PTR_LONG_UNION = 4 # Use 8 in any 64-bit architecture
FMT_PTR_LONG_UNION = "l"  # Use "q" in any 64-bit architecture
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_TEEG_STRUCT = SIZEOF_TEEG_TYPE + SIZEOF_LONG + SIZEOF_PTR_LONG_UNION
FMT_TEEG_STRUCT = FMT_TEEG_TYPE + FMT_LONG + FMT_PTR_LONG_UNION


# Constants for TEEG_EVENTs
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

.
.
.

# Read a TEEG structure
teeg_raw = file_handle.read( SIZEOF_TEEG_STRUCT )
teeg_type, teeg_size, teeg_offset = struct.unpack( FMT_TEEG_STRUCT, teeg_raw )

.
.
.

# Use TEEG_TYPE information
if teeg_type == TEEG_EVENT_TAB1:
    Do something useful

elif teeg_type == TEEG_EVENT_TAB2:
    Do something else useful

else:
    raise ValueError( "Encountered illegal TEEG_EVENT type %d" % teeg_type )

The C enum declaration is a syntactic wrapper around some integer type. See Is the sizeof(enum) == sizeof(int), always?. How big an int is will depend on the particular C compiler. I would probably start by trying 16 bits.

The union reserves a block of memory the size of the largest of the contained data types. Again, the exact size will depend on the C implementation, but I would expect 32 bits for a 32-bit architecture, or 64-bits if this is compiled as native 64-bit code. Generally speaking, you will be able to store the contents of the union in a Python integer or long, regardless of whether what has been saved in it is a pointer or an offset.

A more interesting question is why a pointer would ever be written to a disk file. You may find that the union field is only treated as a pointer when the TEEG struct is in memory, but when written to disk, it is always an integer offset.

As for the :4 notation, as several people have noted, these are "bit fields," meaning a sequence of bits, several of which can be packed into a single space. If I recall correctly, bitfields in C are packed into ints, so both of these 4-bit fields will be packed into a single integer. They can be unpacked with appropriate use of Python's "&" (bitwise and) and ">>" (right shift) operators. Again, exactly how the fields have been packed into the integer, and the size of the integer field itself, will depend on the particular C implementation.

Maybe the following code snippet will help you:

SIZEOF_TEEG_TYPE = 2      # First guess for enum is two bytes
FMT_TEEG_TYPE = "h"       # Could be "b", "B", "h", "H", "l", "L", "q" or "Q"

SIZEOF_LONG = 4           # Use 8 in 64-bit Unix architectures
FMT_LONG = "l"            # Use "q" in 64-bit Unix architectures
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_PTR_LONG_UNION = 4 # Use 8 in any 64-bit architecture
FMT_PTR_LONG_UNION = "l"  # Use "q" in any 64-bit architecture
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_TEEG_STRUCT = SIZEOF_TEEG_TYPE + SIZEOF_LONG + SIZEOF_PTR_LONG_UNION
FMT_TEEG_STRUCT = FMT_TEEG_TYPE + FMT_LONG + FMT_PTR_LONG_UNION


# Constants for TEEG_EVENTs
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

.
.
.

# Read a TEEG structure
teeg_raw = file_handle.read( SIZEOF_TEEG_STRUCT )
teeg_type, teeg_size, teeg_offset = struct.unpack( FMT_TEEG_STRUCT, teeg_raw )

.
.
.

# Use TEEG_TYPE information
if teeg_type == TEEG_EVENT_TAB1:
    Do something useful

elif teeg_type == TEEG_EVENT_TAB2:
    Do something else useful

else:
    raise ValueError( "Encountered illegal TEEG_EVENT type %d" % teeg_type )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文