将“ int_least8_t”施放到`char中时如何发出警告?

发布于 2025-01-28 14:39:13 字数 1880 浏览 5 评论 0原文

我正在构建一个弦库来支持ASCII和UTF8。
我为t_asciit_utf8创建两个Typedef。 ASCII可以安全地阅读为UTF8,但是UTF8不可能被读为ASCII。
当从t_utf8隐含地施放到T_ASCII时,我是否有任何方法可以发出警告,但是当隐含地施放t_ascii t_utf8 < t_utf8 /代码>?

理想情况下,我希望发出这些警告(并且只有这些警告):

#include <stdint.h>

typedef char           t_ascii;
typedef uint_least8_t  t_utf8;

int main()
{
    t_ascii const* asciistr = "Hello world"; // Ok
    t_utf8 const*   utf8str = "你好世界";    // Ok

    asciistr = utf8str; // Warning: utf8 to ascii is not safe
    utf8str = asciistr; // Ok: ascii to utf8 is safe

    t_ascii asciichar = 'A';
    t_utf8   utf8char = 'B';

    asciichar = utf8char; // Warning: utf8 to ascii is not safe
    utf8char = asciichar; // Ok: ascii to utf8 is safe
}

当前,当使用-wall构建时(甚至使用-funsigned-char)进行构建时,我会收到这些警告:

gcc main.c -Wall -Wextra                          
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
   10 |         t_utf8 const*   utf8str = "你好世界";    // Ok
      |                                   ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |         asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |                  ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
   16 |         utf8str = asciistr; // Ok: ascii to utf8 is safe
      |                 ^

I am building a string library to support both ascii and utf8.
I create two typedef for t_ascii and t_utf8. ascii is safe to be read as utf8, but utf8 is not safe to be read as ascii.
Do I have any way to issue a warning when implicitely casting from t_utf8 to t_ascii, but not when implicitely casting t_ascii to t_utf8 ?

Ideally, I would want these warnings (and only these warnings) to be issued:

#include <stdint.h>

typedef char           t_ascii;
typedef uint_least8_t  t_utf8;

int main()
{
    t_ascii const* asciistr = "Hello world"; // Ok
    t_utf8 const*   utf8str = "你好世界";    // Ok

    asciistr = utf8str; // Warning: utf8 to ascii is not safe
    utf8str = asciistr; // Ok: ascii to utf8 is safe

    t_ascii asciichar = 'A';
    t_utf8   utf8char = 'B';

    asciichar = utf8char; // Warning: utf8 to ascii is not safe
    utf8char = asciichar; // Ok: ascii to utf8 is safe
}

Currently, when building with -Wall (and even with -funsigned-char), I get these warnings:

gcc main.c -Wall -Wextra                          
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
   10 |         t_utf8 const*   utf8str = "你好世界";    // Ok
      |                                   ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |         asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |                  ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
   16 |         utf8str = asciistr; // Ok: ascii to utf8 is safe
      |                 ^

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

失与倦" 2025-02-04 14:39:13

-wall编译。始终使用-wall编译。

<user>@squall:~/src/p1$ gcc -Wall -c test2.c
test2.c: In function ‘main’:
test2.c:9:31: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const signed char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
    9 |     t_utf8  const*  utf8str = "你好世界";
      |                               ^~~~~~~~~~~~~~
test2.c:11:13: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const signed char *’} differ in signedness [-Wpointer-sign]
   11 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^
test2.c:12:14: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const signed char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |     asciistr = utf8str; // Should issue warning: utf8 to ascii is not safe
      |              ^

您希望它可以安全地从t_asciit_utf8施放,但这根本不是。签名有所不同。

警告并不是关于有效UTF8有时不是有效的ASCII的事实 - 编译器对此一无所知。警告是关于标志的。

如果您想要无符号char,请使用-funsigned-char编译。但是随后都会发出任何警告。

(顺便说一句,如果您认为该类型int_least8_t将能够保留多键char/完整的UTF8 CodePoint编码 - 它不会。单个汇编单元中的UTF8_T 将具有完全相同的大小。)

Compile with -Wall. Always compile with -Wall.

<user>@squall:~/src/p1$ gcc -Wall -c test2.c
test2.c: In function ‘main’:
test2.c:9:31: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const signed char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
    9 |     t_utf8  const*  utf8str = "你好世界";
      |                               ^~~~~~~~~~~~~~
test2.c:11:13: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const signed char *’} differ in signedness [-Wpointer-sign]
   11 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^
test2.c:12:14: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const signed char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |     asciistr = utf8str; // Should issue warning: utf8 to ascii is not safe
      |              ^

You want it to be safe to cast from t_ascii from t_utf8, but it's simply not. The signedness differs.

The warning is not about the fact that valid utf8 is sometimes not valid ASCII - the compiler knows nothing about that. The warning is about the sign.

If you want an unsigned char, compile with -funsigned-char. But then neither warning will be issued.

(By the way, if you think that type int_least8_t will be able to hold a multibyte char / complete utf8 codepoint encoding - it will not. All int_least8_t and consequently utf8_t in a single compilation unit will have the exact same size.)

笑看君怀她人 2025-02-04 14:39:13

只需用标准C编译器编译它即可。 建议初学者学习哪些编译器选项C?

结果:结果:

<source>: In function 'main':
<source>:9:31: error: pointer targets in initialization of 'const t_utf8 *' {aka 'const unsigned char *'} from 'char *' differ in signedness [-Wpointer-sign]
    9 |     t_utf8 const*   utf8str = "你好世界";    // Ok
      |                               ^~~~~~~~~~
<source>:11:14: error: pointer targets in assignment from 'const t_utf8 *' {aka 'const unsigned char *'} to 'const t_ascii *' {aka 'const char *'} differ in signedness [-Wpointer-sign]
   11 |     asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |              ^
<source>:12:13: error: pointer targets in assignment from 'const t_ascii *' {aka 'const char *'} to 'const t_utf8 *' {aka 'const unsigned char *'} differ in signedness [-Wpointer-sign]
   12 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^

,但不是当隐含地将t_ascii施加到t_utf8?

不,您不能在标准C中拥有它,因为它是无效的指针转换。您可以用明确的演员表使编译器沉默,但是如果这样做,您正在调用不确定的行为。


除此之外,您可以使用c11 _generic找出哪种类型uint_least8_t归结为:

#include <stdint.h>
#include <stdio.h>

#define what_type(obj) printf("%s is same as %s\n", #obj, \
  _Generic ((obj),                                        \
            char: "char",                                 \
            unsigned char: "unsigned char",               \
            signed char: "signed char") );
  

int main (void)
{
    typedef char           t_ascii;
    typedef uint_least8_t  t_utf8;

    t_ascii ascii;
    t_utf8  utf8;

    what_type(ascii);
    what_type(utf8);
}

在GCC X86 Linux上输出:

ascii is same as char
utf8 is same as unsigned char

Simply compile it with a standard C compiler. What compiler options are recommended for beginners learning C?

Result:

<source>: In function 'main':
<source>:9:31: error: pointer targets in initialization of 'const t_utf8 *' {aka 'const unsigned char *'} from 'char *' differ in signedness [-Wpointer-sign]
    9 |     t_utf8 const*   utf8str = "你好世界";    // Ok
      |                               ^~~~~~~~~~
<source>:11:14: error: pointer targets in assignment from 'const t_utf8 *' {aka 'const unsigned char *'} to 'const t_ascii *' {aka 'const char *'} differ in signedness [-Wpointer-sign]
   11 |     asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |              ^
<source>:12:13: error: pointer targets in assignment from 'const t_ascii *' {aka 'const char *'} to 'const t_utf8 *' {aka 'const unsigned char *'} differ in signedness [-Wpointer-sign]
   12 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^

but not when implicitely casting t_ascii to t_utf8 ?

No you can't have that in standard C, since it's an invalid pointer conversion. You can silence the compiler with an explicit cast, but you are invoking undefined behavior if you do.


Apart from that, you could use C11 _Generic to find out which type uint_least8_t boils down to:

#include <stdint.h>
#include <stdio.h>

#define what_type(obj) printf("%s is same as %s\n", #obj, \
  _Generic ((obj),                                        \
            char: "char",                                 \
            unsigned char: "unsigned char",               \
            signed char: "signed char") );
  

int main (void)
{
    typedef char           t_ascii;
    typedef uint_least8_t  t_utf8;

    t_ascii ascii;
    t_utf8  utf8;

    what_type(ascii);
    what_type(utf8);
}

Output on gcc x86 Linux:

ascii is same as char
utf8 is same as unsigned char
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文