将“ int_least8_t”施放到`char中时如何发出警告?
我正在构建一个弦库来支持ASCII和UTF8。
我为t_ascii
和t_utf8
创建两个Typedef。 ASCII可以安全地阅读为UTF8,但是UTF8不可能被读为ASCII。
当从t_utf8
隐含地施放到T_ASCII
时,我是否有任何方法可以发出警告,但是当隐含地施放t_ascii
t_utf8 <
t_utf8 /代码>?
理想情况下,我希望发出这些警告(并且只有这些警告):
#include <stdint.h>
typedef char t_ascii;
typedef uint_least8_t t_utf8;
int main()
{
t_ascii const* asciistr = "Hello world"; // Ok
t_utf8 const* utf8str = "你好世界"; // Ok
asciistr = utf8str; // Warning: utf8 to ascii is not safe
utf8str = asciistr; // Ok: ascii to utf8 is safe
t_ascii asciichar = 'A';
t_utf8 utf8char = 'B';
asciichar = utf8char; // Warning: utf8 to ascii is not safe
utf8char = asciichar; // Ok: ascii to utf8 is safe
}
当前,当使用-wall构建时(甚至使用-funsigned-char
)进行构建时,我会收到这些警告:
gcc main.c -Wall -Wextra
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
10 | t_utf8 const* utf8str = "你好世界"; // Ok
| ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
12 | asciistr = utf8str; // Warning: utf8 to ascii is not safe
| ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
16 | utf8str = asciistr; // Ok: ascii to utf8 is safe
| ^
I am building a string library to support both ascii and utf8.
I create two typedef for t_ascii
and t_utf8
. ascii is safe to be read as utf8, but utf8 is not safe to be read as ascii.
Do I have any way to issue a warning when implicitely casting from t_utf8
to t_ascii
, but not when implicitely casting t_ascii
to t_utf8
?
Ideally, I would want these warnings (and only these warnings) to be issued:
#include <stdint.h>
typedef char t_ascii;
typedef uint_least8_t t_utf8;
int main()
{
t_ascii const* asciistr = "Hello world"; // Ok
t_utf8 const* utf8str = "你好世界"; // Ok
asciistr = utf8str; // Warning: utf8 to ascii is not safe
utf8str = asciistr; // Ok: ascii to utf8 is safe
t_ascii asciichar = 'A';
t_utf8 utf8char = 'B';
asciichar = utf8char; // Warning: utf8 to ascii is not safe
utf8char = asciichar; // Ok: ascii to utf8 is safe
}
Currently, when building with -Wall (and even with -funsigned-char
), I get these warnings:
gcc main.c -Wall -Wextra
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
10 | t_utf8 const* utf8str = "你好世界"; // Ok
| ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
12 | asciistr = utf8str; // Warning: utf8 to ascii is not safe
| ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
16 | utf8str = asciistr; // Ok: ascii to utf8 is safe
| ^
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
用
-wall
编译。始终使用-wall
编译。您希望它可以安全地从
t_ascii
从t_utf8
施放,但这根本不是。签名有所不同。警告并不是关于有效UTF8有时不是有效的ASCII的事实 - 编译器对此一无所知。警告是关于标志的。
如果您想要无符号
char
,请使用-funsigned-char
编译。但是随后都会发出任何警告。(顺便说一句,如果您认为该类型
int_least8_t
将能够保留多键char/完整的UTF8 CodePoint编码 - 它不会。单个汇编单元中的UTF8_T 将具有完全相同的大小。)Compile with
-Wall
. Always compile with-Wall
.You want it to be safe to cast from
t_ascii
fromt_utf8
, but it's simply not. The signedness differs.The warning is not about the fact that valid utf8 is sometimes not valid ASCII - the compiler knows nothing about that. The warning is about the sign.
If you want an unsigned
char
, compile with-funsigned-char
. But then neither warning will be issued.(By the way, if you think that type
int_least8_t
will be able to hold a multibyte char / complete utf8 codepoint encoding - it will not. Allint_least8_t
and consequentlyutf8_t
in a single compilation unit will have the exact same size.)只需用标准C编译器编译它即可。 建议初学者学习哪些编译器选项C?
结果:结果:
不,您不能在标准C中拥有它,因为它是无效的指针转换。您可以用明确的演员表使编译器沉默,但是如果这样做,您正在调用不确定的行为。
除此之外,您可以使用c11
_generic
找出哪种类型uint_least8_t
归结为:在GCC X86 Linux上输出:
Simply compile it with a standard C compiler. What compiler options are recommended for beginners learning C?
Result:
No you can't have that in standard C, since it's an invalid pointer conversion. You can silence the compiler with an explicit cast, but you are invoking undefined behavior if you do.
Apart from that, you could use C11
_Generic
to find out which typeuint_least8_t
boils down to:Output on gcc x86 Linux: