“???” C# 中保存 unicode 文件时的符号

发布于 2024-12-07 11:14:20 字数 488 浏览 0 评论 0原文

我在保存配置时遇到一些问题 - unicode 文本保存为“???”。但问题仅出现在带有 .Net Framework v 2 的 Windows 2003 上。当我在 WinXP 上使用 .Net 4 测试我的代码时,它工作正常,尽管它的目标是设置中的 .Net Framework v2。 我尝试做不同的转换,例如

Encoding.ASCII.GetString(
      Encoding.Convert(Encoding.ASCII, 
                       Encoding.Unicode,
                       Encoding.Unicode.GetBytes(backupPathTextBox.Text)));

但它总是返回“???”或者一些不可读的符号。我用 google 搜索了这个问题,发现所有 C# 字符串都以 UTF16 代码页表示,但 C# 中没有内置 UTF16 解码器。 有人能引导我走向正确的方向吗?

I'm having some issues, when saving configuration - unicode text is saved as "???". But problem appears only on Windows 2003 with .Net Framework v 2 . When I test my code on WinXP with .Net 4 it works fine, despite it's targeted on .Net Framework v2 in settings.
I tried doing different conversions like

Encoding.ASCII.GetString(
      Encoding.Convert(Encoding.ASCII, 
                       Encoding.Unicode,
                       Encoding.Unicode.GetBytes(backupPathTextBox.Text)));

But it always return "???" or some unreadable symbols. I googled this question and found out that all C# strings represented in UTF16 codepage, but there is no UTF16 decoder buitl-in in C#.
Could anyone guide me to the right direction?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人心善变 2024-12-14 11:14:20
Encoding.ASCII.GetString(
  Encoding.Convert(Encoding.ASCII, 
                   Encoding.Unicode,
                   Encoding.Unicode.GetBytes(backupPathTextBox.Text)));

Encoding.Unicode 实际上是 UTF-16LE 编码,其中每个代码单元使用两个字节存储(因此 ASCII 字符最终每个字节之间有零个字节)。微软将其称为“Unicode”,因为他们很早就期望将其用作 Unicode 的最常见编码,但事实并非如此,现在这个名称完全具有误导性。

您的代码的作用是:

  • 将您的文本字符串转换为 UTF-16LE 字节;

  • 然后将它们从 ASCII 字节(它们不是)转换为 UTF-16LE 字节,这意味着在每个字节之间添加一个额外的零字节;

  • 然后将这些字节转换回字符串,就好像它们是 ASCII 一样,这意味着您将得到与一开始基本相同的字符串,但每个字符多了三个零字节,并且非 ASCII 字符变成了成两个。

你到底想用这个做什么?如果要将 Unicode 字符串放入与 ASCII 兼容的文本文件中,则所需的编码通常是 UTF-8 而不是 UTF-16。将字符串转换为 UTF-8 字节非常简单:

new UTF8Encoding(false).GetBytes(backupPathTextBox.Text)

或者直接使用 UTF-8 TextWriter 写入字符串。

Encoding.ASCII.GetString(
  Encoding.Convert(Encoding.ASCII, 
                   Encoding.Unicode,
                   Encoding.Unicode.GetBytes(backupPathTextBox.Text)));

Encoding.Unicode is actually the UTF-16LE encoding, where each code unit is stored using two bytes (and so ASCII characters end up with zero bytes between each one). Microsoft call this “Unicode” because it's what they expected to be used as the most common encoding of Unicode back in the very early days, but it didn't work out like that and now the name is completely misleading.

What your code does is:

  • converts your text string to UTF-16LE bytes;

  • then converts them from ASCII bytes (which they're not) to UTF-16LE bytes, which means an extra zero byte is added between each byte;

  • then converts those bytes back to a string as if they were ASCII, which means you'll get basically the same string as you had to begin with, but with three more zero bytes per one character, and non-ASCII characters turned into two.

What exactly were you trying to do with this? If you want to put Unicode string in an ASCII-compatible text file, the encoding you want is generally UTF-8 and never UTF-16. Converting a string to UTF-8 bytes is as simple as:

new UTF8Encoding(false).GetBytes(backupPathTextBox.Text)

or just use a UTF-8 TextWriter to write the string directly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文