以编码中立的方式打印到终端

发布于 2024-12-18 16:53:51 字数 466 浏览 6 评论 0原文

我想将字符串打印到屏幕上,无论其编码如何(UTF-8,16,32)。该字符串以字符数组表示,因此我需要忽略空字节并继续打印到标准输出;这使得 printf 的家人和朋友不再是问题。

char text[] = { 0x00, 0x55, 0x00, 0x6E, 0x00, 0x69, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x64, 0x00, 0x65 };

fwrite( text, sizeof(char), sizeof(text), stdout );

为此,我选择了上述解决方案,使我能够打印所有 UTF 编码格式。我知道某些终端无法正确显示字符,但这不是我关心的,因为它是应用程序外部的可配置选项。

我的应用程序具有要加载的消息目录(en_EN.UTF-8 等)的设置,但是我想避免必须根据当前选择的区域设置在代码中进行字符串转换。

在启用此方法之前,我可以先对其进行审查吗?

I would like to print a string to the screen regardless of its encoding (UTF-8,16,32). This string is represented in a char array so I need to ignore null bytes and carry on printing to stdout; this puts printf family and friends out of the question.

char text[] = { 0x00, 0x55, 0x00, 0x6E, 0x00, 0x69, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x64, 0x00, 0x65 };

fwrite( text, sizeof(char), sizeof(text), stdout );

To this end I've chosen the above solution to give me the ability to print all UTF encoding formats. I understand that certain terminals will not display the characters correctly but that is not my concern as its a configurable option outside of the application.

My application has settings on which message catalogue to load (en_EN.UTF-8, etc..) however I want to avoid having to do string conversion in the code based on the currently selected locale.

Could I please get a review on this approach before I let it go live?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无风消散 2024-12-25 16:53:51

你不能那样做。当您处理文本时,编码非常重要。所以必须要做转换。

将内容保存在 char 数组中也是不好的,您应该使用 byte 数组。因为:

  • 如果尚未在某些标头中定义,则应将 byte 定义(或 typedef)为 unsigned char。普通 char 可以是有符号的,也可以是无符号的,你将会有惊喜。
  • 更具可读性,因为它使意图清晰。我看到字节,它是一堆字节。我看到 char,它是纯文本(在你的情况下,显然不是这种情况)

You can't do that. When you deal with text, encoding matters big time. So you must do conversion.

And it is also bad to keep things in a char array, you should use a byte array. Because:

  • If not already defined in some header, you should define (or typedef) byte as unsigned char. Plain char can be signed or unsigned, and you will have surprises.
  • More readable, as it makes the intent clear. I see byte, it is a bunch of bytes. I see char, it is plain text (and in your case, it is obviously not the case)
妳是的陽光 2024-12-25 16:53:51

如果您在 Big-Endian 模式下定义 char 数组并且终端接受 Little-Endian 会怎么样?或者反之亦然
我也认为,在处理 char -> 时,如果没有转换就无法生存。 Utf 的东西(只是因为字节序)。定义一些也是合理的

typedef unsigned char  utf8char;
typedef unsigned short utf16char;
typedef unsigned int   utf32char;

typedef enum {
   BIG_ENDIAN,
   LITTLE_ENDIAN
} CHAR_ENDIANNESS

这样你将使到 UTF 的转换更加透明,调试会更容易,代码维护也会得到改善。

What if you defined char array in Big-Endian mode and terminal accepts Little-Endian ? Or vice-versa ?
I too think, that you can't live without conversion when dealing with char -> Utf thing (only because of endianness). Also its reasonable to make define some

typedef unsigned char  utf8char;
typedef unsigned short utf16char;
typedef unsigned int   utf32char;

And

typedef enum {
   BIG_ENDIAN,
   LITTLE_ENDIAN
} CHAR_ENDIANNESS

In that way you will make conversion to UTF more transparent , debug will be easier and code maintenance will improve too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文