将源模块从 Unicode 转换为 ASCII 或反之亦然是否会严重扰乱差异?
在测试套件中,我进行了处理分散在各个模块中的 unicode 的测试。我现在已将它们合并为一个测试类。
.cs 源模块中不再包含任何 unicode,仍保持 unicode 编码,因此其大小是其所需大小的 2 倍。我想将它们转换回 ASCII,以节省空间并缩短这些文件在编辑器和工具中的加载时间。
Q1. 这会破坏我的差异吗?我目前在我的工作站上使用 Kdiff3,但我对 TFS 生成的源模块的历史差异记录更感兴趣。
Q2. 将模块从 Unicode 转换为 ASCII 时,我还需要注意 wrt 源管理吗?
我的具体情况是 .NET 和 TFS,但我认为这个问题可能适用于几乎任何源代码控制系统和编程语言。
In a test suite, I had tests that deal with unicode scattered about in various modules. I have now consolidated them into a single test class.
The .cs source modules that no longer have any unicode in them, remain unicode-encoded, and as a result are 2x their required size. I'd like to convert them back to ASCII, to save the space, and improve load times for these files in editors and tools.
Q1. Will this break my diffs? I currently use Kdiff3 on my workstation, but I'm more interested in the historical diff record for the source modules as generated by TFS.
Q2. Is there anything else I need to be aware of w.r.t. source management when converting a module from Unicode to ASCII ?
My particular situation is .NET and TFS, but I think the question might be applicable to just about any source-code-control system and programming language.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
奇怪的是它被转换为 UTF-16。但从 Visual Studio 2008 中修复它很容易。使用“文件”+“另存为”,保留相同的名称,单击“保存”按钮上的箭头,然后选择“使用编码保存”。单击“编码”组合框并选择 UTF8。这是VS2008使用的默认编码。
生成的文件具有 BOM,就像 UTF-16 版本一样。这对于任何相当现代的 diff 工具(包括 KDiff3)来说应该足够好了。他们会将源代码文件中的文本解码回 Unicode。在几个文件上进行测试以确保。
Odd that it got converted to UTF-16. But it is easy enough to fix from Visual Studio 2008. Use File + Save As, keep the same name, click on the arrow on the Save button and choose Save with Encoding. Click on the "Encoding" combobox and select UTF8. That's the default encoding used by VS2008.
The resulting file has a BOM, just like your UTF-16 version had. That should be good enough for any reasonably modern diff tool, including KDiff3. They'll decode the text in the source code file back to Unicode. Test this on a couple of files to make sure.
为什么不将所有内容都转换为 UTF-8?它可以处理 UTF-16 可以处理的所有内容(这显然就是“Unicode”的意思),但 ASCII 字符每个只占用一个字节,就像 ASCII 一样。而且您不必担心某些文件的编码与其他文件不同。如果您的 diff 工具首先将文件解码为通用编码,则它不会破坏旧的 diff。
将 UTF-16 转换为 ASCII 是一个非常糟糕的主意。你说这些文件中除了 ASCII 之外什么都没有,但如果你错了,非 ASCII 字符将会丢失。也就是说,除非您使用 Java 的
native2ascii
实用程序,它将非 ASCII 字符转换为 Unicode 转义符(例如,à -> \u00C3
),但这会绝对会打破你的差异。Why not convert everything to UTF-8? It can handle everything UTF-16 can (which is obviously what you mean by "Unicode"), but ASCII characters will take up only one byte each, just like ASCII. And you won't have to worry about some of your files being in different encodings than others. If your diff tool decodes the files to a common encoding first, it shouldn't break your old diffs.
Converting UTF-16 to ASCII is a very bad idea. You say there's nothing but ASCII in those files, but if you're wrong, the non-ASCII characters will be lost. That is, unless you use something like Java's
native2ascii
utility, which converts non-ASCII characters to Unicode escapes (for example,Ã -> \u00C3
), but that would definitely break your diffs.