由于意外的编码更改，Visual Studio 2008 项目文件无法加载

发布于 2024-08-26 18:03:51 字数 979 浏览 5 评论 0 原文

在我们的团队中，我们在 Visual Studio 2008 中有一个数据库项目，该项目由 Team Foundation Server 进行源代码控制。每隔两周左右，一位同事签入后，项目文件将无法加载到其他开发人员的计算机上。错误信息是：

无法加载项目文件。根级别的数据无效。第 1 行，位置 1。

当我在 Notepad++ 中查看项目文件时，该文件如下所示：

�� ...

等等（您可以看到 在此）而正常的项目文件如下所示：

...

所以文件的编码可能有问题。这对我们来说是一个问题，因为事实证明不可能再次使文件编码正确。 “解决方案”是丢弃项目文件并从源代码控制中获取最新的工作版本。

根据文件，编码应该是UTF-16。根据Notepad++，损坏的文件实际上是UTF-8。

我的问题是：

为什么 Visual Studio 会搞乱项目文件，显然是在随机时间和在随机机器？
我们应该做什么来防止这种情况发生？
当它发生时，是否有一个恢复当前的可能性改用正确的编码文件从中提取旧版本源头控制？

最后一点：问题出在一个项目文件上，所有其他项目文件都不会暴露此问题。

更新：感谢 Jon Skeet 的建议，我得到了第三个问题的答案。当我用两个字节 FF FE 替换前九个字节 EF BB BF EF BF BD EF BF BD 时，项目文件将再次加载。

这仍然留下了为什么 Visual Studio 会损坏文件的问题。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪是无色的血 2024-09-02 18:03:51

我想我可以提供一些关于正在发生的事情（如果不是原因的话）的见解。

FF FE 是一个 BOM；它出现在文件的开头表明该文件的编码是 UTF-16，little-endian。听起来原始文件确实是 UTF-16，但有些东西忽略了 BOM，并将其读取为 UTF-8。

发生这种情况时，FF 和 FE 中的每个字节都被视为无效，并转换为官方 Unicode 垃圾字符 U+FFFD。然后，当文本再次写入文件时，每个垃圾字符都会转换为其 UTF-8 编码 (EF BF BD) 和 UTF-8 BOM (EF BB BF) 添加在它们前面，从而得到您报告的九字节序列：

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

如果是这种情况，只需将这九个字节替换为 FF FE不安全。无法保证这些是文件中唯一在解释为 UTF-8 时无效的字节。只要文件只包含 ASCII 字符就可以，但其他任何字符，例如重音字符 (é) 或大引号 (')，都将被不可挽回地破坏。

项目文件真的应该是 UTF-16 吗？如果不是，则可能某个开发人员的系统正在生成 UTF-16，而版本控制系统需要 UTF-8。我注意到在我的 Visual C# Express 安装中，Environment->Documents 下有一个选项，名为“当数据无法保存在代码页中时，将文档另存为 Unicode”。这听起来可能会导致编码在明显随机的时间发生变化。

I think I can provide some insight into what's happening, if not why.

FF FE is a BOM; its presence at the beginning of the file indicates that the file's encoding is UTF-16, little-endian. And it sounds like the original file really is UTF-16, but something is ignoring the BOM and reading it as if it were UTF-8.

When that happens, each of the bytes FF and FE is treated as invalid and converted to U+FFFD, the official Unicode garbage character. Then, when the text is written to a file again, each of the garbage characters gets converted to its UTF-8 encoding (EF BF BD) and the UTF-8 BOM (EF BB BF) is added in front of them, resulting in the nine-byte sequence you reported:

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

If this is the case, simply replacing those nine bytes with FF FE is not safe. There's no guarantee those are the only bytes in the file that would be invalid when interpreted as UTF-8. As long as the file contains only ASCII characters you're okay, but anything else, like accented characters (é) or curly quotes (’), will be irretrievably mangled.

Are the project files really supposed to be UTF-16? If not, maybe that one developer's system is generating UTF-16 when the version-control system is expecting UTF-8. I notice in my Visual C# Express install there's an option under Environment->Documents called "Save documents as Unicode when data cannot be saved in codepage". That sounds like something that could cause the encoding to change at apparently random times.

回复收藏 0 原文

~没有更多了~