将二进制数据转换为 Windows-1252 编码是否可能导致数据丢失?
据我所知,将二进制数据转换为文本格式的最佳方法是使用 Base64 编码。 UTF-8 可能会导致丢失。但当我对此进行调查时,我发现 Windows-1252 编码的设计似乎不会导致数据丢失。
我在 此处的博客文章。
最后,我提供了一些为什么我仍然不会将二进制数据存储为 Windows-1252 字符串的原因。但我很好奇是否存在我没有考虑到的实际数据丢失情况。
I understand that the best way to convert binary data to a textual format is to use base64 encoding. UTF-8 can result in lossiness. But as I was investigating this, I found that Windows-1252 encoding does not seem to result in data loss by way of its design.
I provide a lot more context in my blog post here.
At the end, I provide some reasons why I still wouldn't store binary data as a Windows-1252 string. But I'm curious if there's an actual data-loss scenario there I hadn't considered.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不应将二进制数据放入字符串中,因为二进制数据可以包含值低于 32 的字节。
这与字符串的编码无关。
我不确定你从哪里得到“UTF-8 是有损的,但 CP1252 不是”的信息。但我不确定我想知道。
You should NOT put binary data in a string, because binary data can contain bytes with values below 32.
This has nothing to do with the encoding of the string.
And I'm not sure where you got the "UTF-8 is lossy, but CP1252 is not" from. But I'm not sure I want to know.
实际上,如果您认为您没有将
二进制数据
转换为CP1252
,而是在C#
中转换,那么这个问题就更好地考虑了>二进制数据
为CP1252
到UTF-16
,所以问题是CP1252
->UTF-16
->CP1252
保证没有多态性突变。.net 文本编码器最适合
UTF-16
->CP1252
充其量听起来很可疑,虽然测试可能没问题,但在很多情况下您都可以使用中间的UTF-16
字符串执行任何操作仍然保证不会丢失数据,而且效率比字节数组低很多。Really, the problem is better thought of if you consider that you aren't converting
binary data
toCP1252
but inC#
you are convertingbinary data
asCP1252
toUTF-16
, so the question is wouldCP1252
->UTF-16
->CP1252
guarantee no polymorphic mutations.The .net text encoder does a best fit on
UTF-16
->CP1252
that sounds iffy at best, while it may test okay, there aren't many scenarios in which you could do anything with thatUTF-16
string in the middle that would still guarantee no data loss, and it's much less efficient than a byte array.