C/C++在 char 数组上切换大小写

发布于 2024-11-29 18:51:21 字数 558 浏览 1 评论 0原文

我有几个数据结构,每个数据结构都有一个 4 字节的字段。

由于在我的平台上 4 个字节等于 1 int,我想在 case 标签中使用它们:

switch (* ((int*) &structure->id)) {
   case (* ((int*) "sqrt")): printf("its a sqrt!"); break;
   case (* ((int*) "log2")): printf("its a log2!"); break;
   case (((int) 'A')<<8 + (int) 'B'): printf("works somehow, but unreadable"); break;
   default: printf("unknown id");
}

这会导致编译错误,告诉我 case 表达式不会简化为int

我如何使用有限大小的字符数组,并将它们转换为数字类型以在 switch/case 中使用?

I have several data structures, each having a field of 4 bytes.

Since 4 bytes equal 1 int on my platform, I want to use them in case labels:

switch (* ((int*) &structure->id)) {
   case (* ((int*) "sqrt")): printf("its a sqrt!"); break;
   case (* ((int*) "log2")): printf("its a log2!"); break;
   case (((int) 'A')<<8 + (int) 'B'): printf("works somehow, but unreadable"); break;
   default: printf("unknown id");
}

This results in a compile error, telling me the case expression does not reduce to an int.

How can i use char arrays of limited size, and cast them into numerical types to use in switch/case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

你的呼吸 2024-12-06 18:51:21

遵循使用 FourCC 代码进行视频编码的确切方法:

在 C++ 中设置 FourCC 值

#define FOURCC(a,b,c,d) ( (uint32) (((d)<<24) | ((c)<<16) | ((b)<<8) | (a)) )

对每个标识符使用枚举类型或宏可能是个好主意:

enum {
    ID_SQRT = FOURCC( 's', 'q', 'r', 't'),
    ID_LOG2 = FOURCC( 'l', 'o', 'g', '2')
};

int structure_id = FOURCC( structure->id[0], 
                           structure->id[1],
                           structure->id[2],
                           structure->id[3] );
switch (structure_id) {
case ID_SQRT: ...
case ID_LOG2: ...
}

Follow the exact method employed in video encoding with FourCC codes:

Set a FourCC value in C++

#define FOURCC(a,b,c,d) ( (uint32) (((d)<<24) | ((c)<<16) | ((b)<<8) | (a)) )

Probably a good idea to use enumerated types or macros for each identifier:

enum {
    ID_SQRT = FOURCC( 's', 'q', 'r', 't'),
    ID_LOG2 = FOURCC( 'l', 'o', 'g', '2')
};

int structure_id = FOURCC( structure->id[0], 
                           structure->id[1],
                           structure->id[2],
                           structure->id[3] );
switch (structure_id) {
case ID_SQRT: ...
case ID_LOG2: ...
}
孤凫 2024-12-06 18:51:21

免责声明:除娱乐或学习目的外,请勿使用此功能。对于严肃的代码,请使用常见的习惯用法,在一般情况下永远不要依赖编译器的特定行为;如果无论如何这样做,不兼容的平台应该触发编译时错误或使用良好的通用代码。


似乎该标准允许根据语法使用多字符字符常量。尚未检查以下内容是否真的合法。

~/$ cat main.cc

#include <iostream>

#ifdef I_AM_CERTAIN_THAT_MY_PLATFORM_SUPPORTS_THIS_CRAP
int main () {
    const char *foo = "fooo";
    switch ((foo[0]<<24) | (foo[1]<<16) | (foo[2]<<8) | (foo[3]<<0)) {
    case 'fooo': std::cout << "fooo!\n";  break;
    default:     std::cout << "bwaah!\n"; break;
    };
}
#else
#error oh oh oh
#endif

~/$ g++ -Wall -Wextra main.cc  &&  ./a.out
main.cc:5:10: warning: multi-character character constant
fooo!

编辑:哦,看,语法摘录正下方有 2.13.2 字符文字项目符号 1

[...] 包含多个 c 字符的普通字符文字是多字符文字。多性格——
ter 文字具有 int 类型和实现定义的值。

但在第二个项目符号中:

[...] 包含多个 c 字符的宽字符文字的值是实现定义的。

所以要小心。

Disclaimer: Don't use this except for fun or learning purposes. For serious code, use common idioms, never rely on compiler specific behaviour in the general case; if done anyway, incompatible platforms should trigger a compile time error or use the good, general code.


It seems the standard allows multi-character character constants as per the grammar. Haven't checked yet whether the following is really legal though.

~/$ cat main.cc

#include <iostream>

#ifdef I_AM_CERTAIN_THAT_MY_PLATFORM_SUPPORTS_THIS_CRAP
int main () {
    const char *foo = "fooo";
    switch ((foo[0]<<24) | (foo[1]<<16) | (foo[2]<<8) | (foo[3]<<0)) {
    case 'fooo': std::cout << "fooo!\n";  break;
    default:     std::cout << "bwaah!\n"; break;
    };
}
#else
#error oh oh oh
#endif

~/$ g++ -Wall -Wextra main.cc  &&  ./a.out
main.cc:5:10: warning: multi-character character constant
fooo!

edit: Oh look, directly below the grammar excerpt there is 2.13.2 Character Literals, Bullet 1:

[...] An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharac-
ter literal has type int and implementation-defined value.

But in the second bullet:

[...] The value of a wide-character literal containing multiple c-chars is implementation-defined.

So be careful.

心安伴我暖 2024-12-06 18:51:21

我认为这里的问题是在 C 中,switch 语句中的每个 case 标签必须是整数常量表达式。来自 C ISO 规范,§6.8.4.2/3:

每个 case 标签的表达式应为整数常量表达式 [...]

(我的重点)

C 规范然后将“整数常量表达式”定义为常量表达式,其中 (§6.6/6) :

整数常量表达式)应具有整数类型并且仅具有操作数
即整型常量、枚举常量、字符常量、sizeof
结果为整数常量的表达式和浮点常量
强制转换的直接操作数。 整型常量表达式中的强制转换运算符只能
将算术类型转换为整数类型,但作为 sizeof 操作数的一部分除外
运算符。

(我再次强调)。这表明您不能在 case 语句中将字符文字(指针)强制转换为整数,因为整数常量表达式中不允许进行该强制转换。

直观上,其原因可能是在某些实现上,在链接之前不一定指定生成的可执行文件中字符串的实际位置。因此,如果标签依赖于间接依赖于这些字符串地址的常量表达式,编译器可能无法为 switch 语句发出非常好的代码,因为它可能会错过编译跳转的机会例如,表格。这只是一个示例,但规范中更严格的语言明确禁止您执行上述操作。

希望这有帮助!

I believe that the issue here is that in C, each case label in a switch statement must be an integer constant expression. From the C ISO spec, §6.8.4.2/3:

The expression of each case label shall be an integer constant expression [...]

(my emphasis)

The C spec then defines an "integer constant expression" as a constant expression where (§6.6/6):

An integer constant expression) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and floating constants that are the
immediate operands of casts. Cast operators in an integer constant expression shall only
convert arithmetic types to integer types, except as part of an operand to the sizeof
operator.

(my emphasis again). This suggests that you cannot typecast a character literal (a pointer) to an integer in a case statement, since that cast isn't allowed in an integer constant expression.

Intuitively, the reason for this might be that on some implementations the actual location of the strings in the generated executable isn't necessarily specified until linking. Consequently, the compiler might not be able to emit very good code for the switch statement if the labels depended on a constant expression that depend indirectly on the address of those strings, since it might miss opportunities to compile jump tables, for example. This is just an example, but the more rigorous language of the spec explicitly forbids you from doing what you've described above.

Hope this helps!

£烟消云散 2024-12-06 18:51:21

问题是 switchcase 分支需要一个常量值。特别是,在编译时已知的常量。字符串的地址在编译时是未知的 - 链接器知道地址,但甚至不知道最终地址。我认为最终的、重新定位的地址仅在运行时可用。

您可以将问题简化为

void f() {
    int x[*(int*)"x"];
}

这会产生相同的错误,因为 "x" 文字的地址在编译时未知。这与 例如 不同

void f() {
    int x[sizeof("x")];
}

,因为编译器知道指针的大小(32 位版本中为 4 个字节)。

现在,如何解决您的问题?我想到了两件事:

  1. 不要将 id 字段设为字符串,而是将其设为整数,然后在 case 语句中使用常量列表。

  2. 我怀疑您需要在多个地方执行这样的 switch 操作,因此我的另一个建议是:首先不要使用 switch根据结构的类型执行代码。相反,该结构可以提供一个函数指针,可以调用该函数指针来执行正确的 printf 调用。创建结构体时,函数指针被设置为正确的函数。

这是说明第二个想法的代码草图:

struct MyStructure {
   const char *id;
   void (*printType)(struct MyStructure *, void);
   void (*doThat)(struct MyStructure *, int arg, int arg);
   /* ... */
};

static void printSqrtType( struct MyStructure * ) {
   printf( "its a sqrt\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "its a log2\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "works somehow, but unreadable\n" );
}

/* Initializes the function pointers in the structure depending on the id. */
void setupVTable( struct MyStructure *s ) {
  if ( !strcmp( s->id, "sqrt" ) ) {
    s->printType = printSqrtType;
  } else if ( !strcmp( s->id, "log2" ) ) {
    s->printType = printLog2Type;
  } else {
    s->printType = printUnreadableType;
  }
}

有了这个,您的原始代码就可以做到:

void f( struct MyStruct *s ) {
    s->printType( s );
}

这样,您可以将类型检查集中在一个位置,而不是用大量 switch 使代码变得混乱声明。

The issue is that the case branches of a switch expect a constant value. In particular, a constant which is known at compile time. The address of strings isn't known at compile time - the linker knows the address, but not even the final address. I think the final, relocated, address is only available at runtime.

You can simplify your problem to

void f() {
    int x[*(int*)"x"];
}

This yields the same error, since the address of the "x" literal is not known at compile time. This is different from e.g.

void f() {
    int x[sizeof("x")];
}

Since the compiler knows the size of the pointer (4 bytes in 32bit builds).

Now, how to fix your problem? Two things come to my mind:

  1. Don't make the id field a string but an integer and then use a list of constants in your case statements.

  2. I suspect that you will need to do a switch like this in multiple places, so my other suggestion is: don't use a switch in the first place to execute code depending on the type of the structure. Instead, the structure could offer a function pointer which can be called to do the right printf call. At the time the struct is created, the function pointer is set to the correct function.

Here's a code sketch illustrating the second idea:

struct MyStructure {
   const char *id;
   void (*printType)(struct MyStructure *, void);
   void (*doThat)(struct MyStructure *, int arg, int arg);
   /* ... */
};

static void printSqrtType( struct MyStructure * ) {
   printf( "its a sqrt\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "its a log2\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "works somehow, but unreadable\n" );
}

/* Initializes the function pointers in the structure depending on the id. */
void setupVTable( struct MyStructure *s ) {
  if ( !strcmp( s->id, "sqrt" ) ) {
    s->printType = printSqrtType;
  } else if ( !strcmp( s->id, "log2" ) ) {
    s->printType = printLog2Type;
  } else {
    s->printType = printUnreadableType;
  }
}

With this in place, your original code can just do:

void f( struct MyStruct *s ) {
    s->printType( s );
}

That way, you centralize the type check in a single place instead of cluttering your code with a lot of switch statements.

陌上青苔 2024-12-06 18:51:21

由于对齐的原因,这尤其危险:在许多体系结构上,int 是 4 字节对齐的,但字符数组不是。例如,在 sparc 上,即使此代码可以编译(但不能编译,因为直到链接时才知道字符串地址),它也会立即引发 SIGBUS

This is especially dangerous because of alignment: on many architectures, int is 4-byte aligned, but character arrays are not. On sparc, for example, even if this code could compile (which it can't because the string address aren't known until link time) it would immediately raise SIGBUS.

硪扪都還晓 2024-12-06 18:51:21

我刚刚结束使用这个宏,类似于问题或 phresnels 答案中的情况 #3。

#define CHAR4_TO_INT32(a, b, c, d) ((((int32_t)a)<<24)+ (((int32_t)b)<<16) + (((int32_t)c)<<8)+ (((int32_t)d)<<0)) 

switch (* ((int*) &structure->id)) {
   case (CHAR4_TO_INT32('S','Q','R','T')): printf("its a sqrt!"); break;
}

i just ended up using this macro, similar to case #3 in the question or phresnels answer.

#define CHAR4_TO_INT32(a, b, c, d) ((((int32_t)a)<<24)+ (((int32_t)b)<<16) + (((int32_t)c)<<8)+ (((int32_t)d)<<0)) 

switch (* ((int*) &structure->id)) {
   case (CHAR4_TO_INT32('S','Q','R','T')): printf("its a sqrt!"); break;
}
一曲琵琶半遮面シ 2024-12-06 18:51:21

这比 C++ 更像 C。

联合 int_char4 { int_32 x; char[4] y;}

联合体声明、定义其成员从同一地址开始,本质上为同一组字节提供不同的类型。

int_char4 ic4;
ic4.x 是一个 int,ic4.y 是一个指向 char 数组第一个字节的指针。

既然你想学习,实施就取决于你了。

this is more C than c++.

union int_char4 { int_32 x; char[4] y;}

a union declares, defines its members to start on the same address, essentially providing different types for the same set of bytes.

int_char4 ic4;
ic4.x is an int and ic4.y is a pointer to the first byte of the char array.

since, you want to learn, the implementation is up to you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文