由于意外的编码更改,Visual Studio 2008 项目文件无法加载

发布于 2024-08-26 18:03:51 字数 979 浏览 5 评论 0 原文

在我们的团队中,我们在 Visual Studio 2008 中有一个数据库项目,该项目由 Team Foundation Server 进行源代码控制。每隔两周左右,一位同事签入后,项目文件将无法加载到其他开发人员的计算机上。错误信息是:

无法加载项目文件。根级别的数据无效。第 1 行,位置 1。

当我在 Notepad++ 中查看项目文件时,该文件如下所示:

�� ...

等等(您可以看到 在此) 而正常的项目文件如下所示:

...

所以文件的编码可能有问题。这对我们来说是一个问题,因为事实证明不可能再次使文件编码正确。 “解决方案”是丢弃项目文件并从源代码控制中获取最新的工作版本。

根据文件,编码应该是UTF-16。根据Notepad++,损坏的文件实际上是UTF-8。

我的问题是:

  • 为什么 Visual Studio 会搞乱 项目文件, 显然是在随机时间和在 随机机器?
  • 我们应该做什么来防止这种情况发生?
  • 当它发生时,是否有一个 恢复当前的可能性 改用正确的编码文件 从中提取旧版本 源头控制?

最后一点:问题出在一个项目文件上,所有其他项目文件都不会暴露此问题。

更新:感谢 Jon Skeet 的建议,我得到了第三个问题的答案。 当我用两个字节 FF FE 替换前九个字节 EF BB BF EF BF BD EF BF BD 时,项目文件将再次加载。

这仍然留下了为什么 Visual Studio 会损坏文件的问题。

In our team we have a database project in visual Studio 2008 which is under source control by Team Foundation Server. Every two weeks or so, after one co-worker checks in, the project file won't load on the other developers machines. The error message is:

The project file could not be loaded. Data at the root level is invalid. Line 1, position 1.

When I look at the project file in Notepad++, the file looks like this:

��<NUL?NULxNULmNULlNUL NULvNULeNULrNULsNULiNULoNULnNUL ...

and so on (you can see <?xml version in this)
whereas an normal project file looks like:

<?xml version="1.0" encoding="utf-16"?> ...

So probably something is wrong with the encoding of the file. This is a problem for us because it turns out to be impossible to get the file encoding correct again. The 'solution' is to throw away the project file an get the last know working version from source control.

According to the file, the encoding should be UTF-16. According to Notepad++, the corrupted file is actually UTF-8.

My questions are:

  • Why is Visual Studio messing up the encoding of the
    project file,
    apparently at random times and at
    random machines?
  • What should we do to prevent this?
  • When it has happened, is there a
    possibility to restore the current
    file in the correct encoding instead
    of pulling an older version from
    source control?

As a last note: the problem is with one single project file, all other project files don't expose this problem.

UPDATE: Thanks to Jon Skeet's suggestion I have the answer to question number three.
When I replace the first nine bytes EF BB BF EF BF BD EF BF BD by the two bytes FF FE, the project file will load again.

This leaves still the question why Visual Studio corrupts the file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泪是无色的血 2024-09-02 18:03:51

我想我可以提供一些关于正在发生的事情(如果不是原因的话)的见解。

FF FE 是一个 BOM;它出现在文件的开头表明该文件的编码是 UTF-16,little-endian。听起来原始文件确实是 UTF-16,但有些东西忽略了 BOM,并将其读取为 UTF-8。

发生这种情况时,FFFE 中的每个字节都被视为无效,并转换为官方 Unicode 垃圾字符 U+FFFD。然后,当文本再次写入文件时,每个垃圾字符都会转换为其 UTF-8 编码 (EF BF BD) 和 UTF-8 BOM (EF BB BF) 添加在它们前面,从而得到您报告的九字节序列:

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

如果是这种情况,只需将这九个字节替换为 FF FE不安全。无法保证这些是文件中唯一在解释为 UTF-8 时无效的字节。只要文件只包含 ASCII 字符就可以,但其他任何字符,例如重音字符 (é) 或大引号 ('),都将被不可挽回地破坏。

项目文件真的应该是 UTF-16 吗?如果不是,则可能某个开发人员的系统正在生成 UTF-16,而版本控制系统需要 UTF-8。我注意到在我的 Visual C# Express 安装中,Environment->Documents 下有一个选项,名为“当数据无法保存在代码页中时,将文档另存为 Unicode”。这听起来可能会导致编码在明显随机的时间发生变化。

I think I can provide some insight into what's happening, if not why.

FF FE is a BOM; its presence at the beginning of the file indicates that the file's encoding is UTF-16, little-endian. And it sounds like the original file really is UTF-16, but something is ignoring the BOM and reading it as if it were UTF-8.

When that happens, each of the bytes FF and FE is treated as invalid and converted to U+FFFD, the official Unicode garbage character. Then, when the text is written to a file again, each of the garbage characters gets converted to its UTF-8 encoding (EF BF BD) and the UTF-8 BOM (EF BB BF) is added in front of them, resulting in the nine-byte sequence you reported:

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

If this is the case, simply replacing those nine bytes with FF FE is not safe. There's no guarantee those are the only bytes in the file that would be invalid when interpreted as UTF-8. As long as the file contains only ASCII characters you're okay, but anything else, like accented characters (é) or curly quotes (), will be irretrievably mangled.

Are the project files really supposed to be UTF-16? If not, maybe that one developer's system is generating UTF-16 when the version-control system is expecting UTF-8. I notice in my Visual C# Express install there's an option under Environment->Documents called "Save documents as Unicode when data cannot be saved in codepage". That sounds like something that could cause the encoding to change at apparently random times.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文