在 glib 中打印 utf8
为什么 utf8 符号不能通过 glib 函数打印?
源代码:
#include "glib.h"
#include <stdio.h>
int main() {
g_print("марко\n");
fprintf(stdout, "марко\n");
}
像这样构建它:
gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs)
你可以看到 glib 不能打印 utf8 而 fprintf 可以:
[marko@marko-work utf8test]$ ./main
?????
марко
Why utf8 symbols cannot be printed via glib functions?
Source code:
#include "glib.h"
#include <stdio.h>
int main() {
g_print("марко\n");
fprintf(stdout, "марко\n");
}
Build it like this:
gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs)
You could see that glib can't print utf8 and fprintf can:
[marko@marko-work utf8test]$ ./main
?????
марко
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
fprint 函数假设您用它们打印的每个字符串都经过正确编码,以匹配终端的当前编码。 g_print() 不假设这一点,并且如果认为有必要,则会转换编码;当然,如果编码之前实际上是正确的,那么这是一个坏主意,因为这很可能会破坏编码。您的终端的区域设置是什么?
您可以在大多数系统上通过环境变量设置正确的区域设置,也可以使用 setlocale 函数以编程方式进行设置。语言环境名称取决于系统(不是 POSIX 标准的一部分),但在大多数系统上,以下内容都可以工作:
除了 LC_ALL,您还可以只为某些操作设置语言环境(例如“en_US”将导致英文数字和日期格式,但也许您不希望数字/日期以这种方式格式化)。引用 setlocale 手册页:
所有系统上始终可用的唯一两个区域设置值是“C”、“POSIX”和“”。
fprint functions assume that every string you print with them is correctly encoded to match the current encoding of your terminal. g_print() does not assume that and will convert the encoding if it thinks that is necessary; of course this is a bad idea, if the encoding was actually correct before, since that will most likely destroy the encoding. What is the locale setting of your terminal?
You can either set the correct locale by environment variables on most systems or you can do it programatically using the setlocale function. The locale names are system dependent (not part of the POSIX standard), but on most systems the following will work:
Instead of LC_ALL you can also only set the locale for certain operations (e.g. "en_US" will cause English number and date formatting, but maybe you don't want numbers/dates to be formatted that way). To quote from the setlocale man page:
The only two locale values that are always available on all systems are "C", "POSIX" and "".
您需要在程序启动时调用 setlocale 来初始化区域设置的编码。
如果您使用像 gtk_init(..) 或类似的一些初始化函数,通常会为您执行此操作。
You need to initialize the locale's encoding by calling setlocale at your program's start.
This is normally carried out for you if you use some initialization function like
gtk_init(..)
or similar.从 g_print() 传递到 glibc 的字符串不一定采用 UTF-8 编码,因为 g_print() 会将字符集转换为语言环境指定的字符集。
The string passed from g_print() to glibc is not necessarily in UTF-8 encoding since g_print() does character set conversion to the charset specified by the locale.
通常不建议在文本文件中使用除 ASCII 之外的任何内容。您应该使用 gettext 等工具来翻译不同语言的单词。如果这是不可能的,那么您应该在代码中以 UTF-8 格式存储字符串。
尝试打印这个(它是字符串的十六进制表示形式):
这对我来说在 printf 中有效(无法使用 glib 进行测试):
将输出重定向到文件并将其视为 UTF-8。
希望有帮助。
Usually it is not recommended to use anything other than ASCII inside text files. You should use tools like gettext in order to translate words from different languages. If this is out of the question then you should store your string in UTF-8 in your code.
Try printing this one (it's the hexadecimal representation of your string):
This works for me in printf (cannot test here with glib):
Redirect the output to file and see it as UTF-8.
Hope it helps.