“无法翻译 Unicode 字符”保存到txt文件时出错
附加信息:无法 翻译 Unicode 字符 \uDFFF 于 索引 195 到指定的代码页。
我做了一个算法,其结果是二进制值(不同长度)。我将其转换为 uint,然后转换为字符并保存到 stringbuilder 中,如下所示:
uint n = Convert.ToUInt16(tmp_chars, 2);
_koded_text.Append(Convert.ToChar(n));
我的问题是,当我尝试将这些值保存到 .txt 中时,我收到了前面提到的错误。
StreamWriter file = new StreamWriter(filename);
file.WriteLine(_koded_text);
file.Close();
我要保存的是:“贫穷췾᷿]볯褟ﶞ糖尿病ﳻ伞ﳴ㿯ﹽ翼蛿㐻ﰻ筹��﷿₩マ랿鳿⏟獐펿”......这是一些奇怪的迹象。
我需要的是将这些二进制值转换为某种字符串并将其保存到txt。我在某处看到转换为 UTF8 应该有帮助,但我不知道该怎么做。更改文件编码也会有帮助吗?
Additional information: Unable to
translate Unicode character \uDFFF at
index 195 to specified code page.
I made an algorithm, who's result are binary values (different lengths). I transformed it into uint
, and then into chars and saved into stringbuilder, as you can see below:
uint n = Convert.ToUInt16(tmp_chars, 2);
_koded_text.Append(Convert.ToChar(n));
My problem is, that when i try to save those values into .txt i get the previously mentioned error.
StreamWriter file = new StreamWriter(filename);
file.WriteLine(_koded_text);
file.Close();
What i am saving is this: "忿췾᷿]볯褟ﶞ痢ﳻ��伞ﳴ㿯ﹽ翼蛿㐻ﰻ筹��﷿₩マ랿鳿⏟麞펿"... which are some weird signs.
What i need is to convert those binary values into some kind of string of chars and save it to txt. I saw somewhere that converting to UTF8 should help, but i don't know how to. Would changing files encoding help too?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您无法将二进制数据直接转换为字符串。字符串中的 Unicode 字符在 .NET 中使用 utf16 进行编码。该编码每个字符使用两个字节,提供 65536 个不同的值。然而,Unicode 拥有超过一百万个代码点。为了实现这一点,\uffff 之上(BMP,基本多语言平面之上)的 Unicode 代码点使用代理对进行编码。第一个值介于 0xd800 和 0xdbff 之间,第二个值介于 0xdc00 和 0xdfff 之间。这提供了 2 ^ (10 + 10) = 100 万个附加代码。
您也许可以看到这会导致什么结果,在您的情况下,代码检测到未与低代理值配对的高代理值(0xdfff)。这是非法的。更多可能的事故,几个代码点未分配,几个是在字符串标准化时被破坏的变音符号。
你就是无法完成这项工作。 Base64 编码是在文本流中传输二进制数据的标准方法。每个字符使用 6 位,3 个字节需要 4 个字符。字符集是 ASCII,因此接收程序将字符错误解码回二进制的可能性很小。只有使用 EBCDIC 的几十年前的 IBM 大型机才会给您带来麻烦。或者只是简单地避免编码为文本并保持二进制。
You cannot transform binary data to a string directly. The Unicode characters in a string are encoded using utf16 in .NET. That encoding uses two bytes per character, providing 65536 distinct values. Unicode however has over one million codepoints. To make that work, the Unicode codepoints above \uffff (above the BMP, Basic Multilingual Plane) are encoded with a surrogate pair. The first one has a value between 0xd800 and 0xdbff, the second between 0xdc00 and 0xdfff. That provides 2 ^ (10 + 10) = 1 million additional codes.
You can perhaps see where this leads, in your case the code detects a high surrogate value (0xdfff) that isn't paired with a low surrogate. That's illegal. Lots more possible mishaps, several codepoints are unassigned, several are diacritics that get mangled when the string is normalized.
You just can't make this work. Base64 encoding is the standard way to carry binary data across a text stream. It uses 6 bits per character, 3 bytes require 4 characters. The character set is ASCII so the odds of the receiving program decoding the character back to binary incorrectly are minimal. Only a decades old IBM mainframe that uses EBCDIC could get you into trouble. Or just plain avoid encoding to text and keep it binary.
由于您正在尝试将二进制数据编码为文本流这个问题已经包含问题的答案:“如何将某些内容编码为 base64?”从那里开始,纯 ASCII/ANSI 文本就适合输出编码。
Since you're trying to encode binary data to a text stream this SO question already contains an answer to the question: "How do I encode something as base64?" From there plain ASCII/ANSI text is fine for the output encoding.