在 glib 中打印 utf8

发布于 2024-09-06 10:10:44 字数 405 浏览 6 评论 0原文

为什么 utf8 符号不能通过 glib 函数打印?

源代码:

#include "glib.h"
#include <stdio.h>

int main() {
    g_print("марко\n");
    fprintf(stdout, "марко\n");
}

像这样构建它:

gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs)

你可以看到 glib 不能打印 utf8 而 fprintf 可以:

[marko@marko-work utf8test]$ ./main 
?????
марко

Why utf8 symbols cannot be printed via glib functions?

Source code:

#include "glib.h"
#include <stdio.h>

int main() {
    g_print("марко\n");
    fprintf(stdout, "марко\n");
}

Build it like this:

gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs)

You could see that glib can't print utf8 and fprintf can:

[marko@marko-work utf8test]$ ./main 
?????
марко

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

温柔戏命师 2024-09-13 10:10:44

fprint 函数假设您用它们打印的每个字符串都经过正确编码,以匹配终端的当前编码。 g_print() 不假设这一点,并且如果认为有必要,则会转换编码;当然,如果编码之前实际上是正确的,那么这是一个坏主意,因为这很可能会破坏编码。您的终端的区域设置是什么?

您可以在大多数系统上通过环境变量设置正确的区域设置,也可以使用 setlocale 函数以编程方式进行设置。语言环境名称取决于系统(不是 POSIX 标准的一部分),但在大多数系统上,以下内容都可以工作:

#include <locale.h>

:

setlocale(LC_ALL, "en_US.utf8");

除了 LC_ALL,您还可以只为某些操作设置语言环境(例如“en_US”将导致英文数字和日期格式,但也许您不希望数字/日期以这种方式格式化)。引用 setlocale 手册页:

LC_ALL 设置整个语言环境
一般来说。

LC_COLLATE 设置字符串的区域设置
整理例程。这控制
按字母顺序排列
strcoll() 和 strxfrm()。

LC_CTYPE 设置区域设置
ctype(3) 和 multibyte(3) 函数。
这控制了识别
大写和小写、字母或非字母
字符等等。

LC_MESSAGES 设置消息的区域设置
目录,请参阅 catopen(3) 函数。

LC_MONETARY 设置区域设置
设置货币值的格式;这
影响 localeconv() 函数。

LC_NUMERIC 设置区域设置
格式化数字。这控制了
小数点格式
函数中浮点数的输入和输出
例如 printf() 和 scanf(),如
以及 localeconv() 返回的值。

LC_TIME 设置区域设置
使用格式化日期和时间
strftime() 函数。

所有系统上始终可用的唯一两个区域设置值是“C”、“POSIX”和“”。

默认情况下仅定义了三种语言环境:空字符串“”(表示本机环境)
以及“C”和“POSIX”语言环境(表示 C 语言环境)。语言环境参数为 NULL
导致 setlocale() 返回当前区域设置。默认情况下,C 程序在“C”语言环境中启动。这
库中唯一设置语言环境的函数是 setlocale();语言环境永远不会改变
其他一些例程的效果。

fprint functions assume that every string you print with them is correctly encoded to match the current encoding of your terminal. g_print() does not assume that and will convert the encoding if it thinks that is necessary; of course this is a bad idea, if the encoding was actually correct before, since that will most likely destroy the encoding. What is the locale setting of your terminal?

You can either set the correct locale by environment variables on most systems or you can do it programatically using the setlocale function. The locale names are system dependent (not part of the POSIX standard), but on most systems the following will work:

#include <locale.h>

:

setlocale(LC_ALL, "en_US.utf8");

Instead of LC_ALL you can also only set the locale for certain operations (e.g. "en_US" will cause English number and date formatting, but maybe you don't want numbers/dates to be formatted that way). To quote from the setlocale man page:

LC_ALL Set the entire locale
generically.

LC_COLLATE Set a locale for string
collation routines. This controls
alphabetic ordering in
strcoll() and strxfrm().

LC_CTYPE Set a locale for the
ctype(3) and multibyte(3) functions.
This controls recognition of
upper and lower case, alphabetic or non-alphabetic
characters, and so on.

LC_MESSAGES Set a locale for message
catalogs, see catopen(3) function.

LC_MONETARY Set a locale for
formatting monetary values; this
affects the localeconv() function.

LC_NUMERIC Set a locale for
formatting numbers. This controls the
formatting of decimal points in
input and output of floating point numbers in functions
such as printf() and scanf(), as
well as values returned by localeconv().

LC_TIME Set a locale for
formatting dates and times using the
strftime() function.

The only two locale values that are always available on all systems are "C", "POSIX" and "".

Only three locales are defined by default: the empty string "" (which denotes the native environment)
and the "C" and "POSIX" locales (which denote the C-language environment). A locale argument of NULL
causes setlocale() to return the current locale. By default, C programs start in the "C" locale. The
only function in the library that sets the locale is setlocale(); the locale is never changed as a side
effect of some other routine.

春风十里 2024-09-13 10:10:44

您需要在程序启动时调用 setlocale 来初始化区域设置的编码。

setlocale(LC_CTYPE, "")

如果您使用像 gtk_init(..) 或类似的一些初始化函数,通常会为您执行此操作。

You need to initialize the locale's encoding by calling setlocale at your program's start.

setlocale(LC_CTYPE, "")

This is normally carried out for you if you use some initialization function like gtk_init(..) or similar.

任性一次 2024-09-13 10:10:44

从 g_print() 传递到 glibc 的字符串不一定采用 UTF-8 编码,因为 g_print() 会将字符集转换为语言环境指定的字符集。

The string passed from g_print() to glibc is not necessarily in UTF-8 encoding since g_print() does character set conversion to the charset specified by the locale.

爺獨霸怡葒院 2024-09-13 10:10:44

通常建议在文本文件中使用除 ASCII 之外的任何内容。您应该使用 gettext 等工具来翻译不同语言的单词。如果这是不可能的,那么您应该在代码中以 UTF-8 格式存储字符串。

尝试打印这个(它是字符串的十六进制表示形式):

char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0};

这对我来说在 printf 中有效(无法使用 glib 进行测试):

#include <stdio.h>

char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0};

int main(void)
{
    printf("%s\n",hex_marco);
    return 0;
}

将输出重定向到文件并将其视为 UTF-8。

希望有帮助。

Usually it is not recommended to use anything other than ASCII inside text files. You should use tools like gettext in order to translate words from different languages. If this is out of the question then you should store your string in UTF-8 in your code.

Try printing this one (it's the hexadecimal representation of your string):

char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0};

This works for me in printf (cannot test here with glib):

#include <stdio.h>

char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0};

int main(void)
{
    printf("%s\n",hex_marco);
    return 0;
}

Redirect the output to file and see it as UTF-8.

Hope it helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文