源代码是否应该以 UTF-8 格式保存

发布于 2024-08-20 01:23:19 字数 319 浏览 2 评论 0原文

以 UTF-8 格式保存源代码有多重要?

Windows 上的 Eclipse 默认使用 CP1252 字符编码。 CP1251 格式意味着可以保存非 UTF-8 字符,如果您从 Word 文档复制并粘贴注释以进行注释,我就看到过这种情况。

我之所以问这个问题,是因为出于习惯,我将 Maven 编码设置为 UTF-8 格式,最近它发现了一些不可映射的错误。

(更新)请添加这样做的任何原因以及为什么,是否有一些应该知道的常见问题?

(更新)你的目标是什么?为了找到最佳实践,所以当问为什么我们应该使用 UTF-8 时,我有一个很好的答案,但现在我没有。

How important is it to save your source code in UTF-8 format?

Eclipse on Windows uses CP1252 character encoding by default. The CP1251 format means non UTF-8 characters can be saved and I have seen this happen if you copy and paste from a Word document for a comment.

The reason I ask is because out of habit I set-up Maven encoding to be in UTF-8 format and recently it has caught a few non mappable errors.

(update) Please add any reasons for doing so and why, are there some common gotchas that should be known?

(update) What is your goal? To find the best practice so when ask why should we use UTF-8 I have a good answer, right now I don't.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

贪恋 2024-08-27 01:23:19

你的目标是什么?权衡您的需求与此选择的利弊。

UTF-8 Pros

  • 允许使用所有字符文字,无需 \uHHHH 转义

UTF-8 Cons

  • 使用非 ASCII 字符文字,无需 \ uHHHH 增加字符损坏的风险
    • 可能会出现字体和键盘问题
    • 需要在所有工具(编辑器、编译器、构建脚本、差异工具)中记录并强制使用 UTF-8
  • 注意字节顺序标记

ASCII Pros

  • 字符/字节映射由多种编码共享
    • 使源文件非常便携
    • 通常不需要指定编码元数据(因为如果将文件重新编码为 UTF-8、Windows-1252、ISO 8859-1 以及大多数缺少 UTF-16 和/或的内容,文件将是相同的) EBCDIC)

ASCII 缺点

  • 字符集有限,
  • 这不是 1960 年代

注意:ASCII 是 7 位,不是“扩展”,不要与 Windows-1252 混淆、ISO 8859-1 或其他任何内容。

What is your goal? Balance your needs against the pros and cons of this choice.

UTF-8 Pros

  • allows use of all character literals without \uHHHH escaping

UTF-8 Cons

  • using non-ASCII character literals without \uHHHH increases risk of character corruption
    • font and keyboard issues can arise
    • need to document and enforce use of UTF-8 in all tools (editors, compilers build scripts, diff tools)
  • beware the byte order mark

ASCII Pros

  • character/byte mappings are shared by a wide range of encodings
    • makes source files very portable
    • often obviates the need for specifying encoding meta-data (since the files would be identical if they were re-encoded as UTF-8, Windows-1252, ISO 8859-1 and most things short of UTF-16 and/or EBCDIC)

ASCII Cons

  • limited character set
  • this isn't the 1960s

Note: ASCII is 7-bit, not "extended" and not to be confused with Windows-1252, ISO 8859-1, or anything else.

萌︼了一个春 2024-08-27 01:23:19

重要的是,至少您需要与用于避免鲱鱼的编码一致。因此,X 在这里,Y 在那里,Z 在别处。将源代码保存为编码 X。将代码输入设置为编码 X。将代码输出设置为编码 X。将基于字符的 FTP 传输设置为编码 X。等等。

如今,UTF-8 是一个不错的选择,因为它涵盖了人类世界所识别的每个字符,并且几乎在所有地方都受到支持。所以,是的,我也会为其设置工作区编码。我也是这么用的。

Important is at least that you need to be consistent with the encoding used to avoid herrings. Thus not, X here, Y there and Z elsewhere. Save source code in encoding X. Set code input to encoding X. Set code output to encoding X. Set characterbased FTP transfer to encoding X. Etcetera.

Nowadays UTF-8 is a good choice as it covers every character the human world is aware of and is pretty everywhere supported. So, yes, I would set workspace encoding to it as well. I also use it so.

寄与心 2024-08-27 01:23:19

恕我直言,Eclipse 使用平台默认编码的默认设置是一个糟糕的决定。我发现有必要在安装后不久将默认值更改为 UTF-8,因为我现有的一些源文件使用了它(可能来自从网页复制/粘贴的片段)。Java

语言和 API 规范需要 UTF-8 支持,因此您就标准工具而言绝对没问题,而且我已经很久没有见过不支持 UTF-8 的像样的编辑器了。

即使在使用 JNI 的项目中,您的 C 源代码通常也采用 US-ASCII(它是 UTF-8 的子集),因此在同一个 IDE 中打开两者不会出现问题。

Eclipse's default setting of using the platform default encoding is a poor decision IMHO. I found it necessary to change the default to UTF-8 shortly after installing it because some of my existing source files used it (probably from snippets copied/pasted from web pages.)

The Java Language and API specs require UTF-8 support so you're definitely okay as far as the standard tools go, and it's a long time since I've seen a decent editor that did not support UTF-8.

Even in projects that use JNI, your C sources will normally be in US-ASCII which is a subset of UTF-8 so having both open in the same IDE will not be a problem.

我的鱼塘能养鲲 2024-08-27 01:23:19

是的,除非您的编译器/解释器无法处理 UTF-8 文件,否则这绝对是可行的方法。

Yes, unless your compiler/interpreter is not able to work with UTF-8 files, it is definitely the way to go.

熟人话多 2024-08-27 01:23:19

我不认为这个问题真的有一个直接的是或否的答案。我想说,应该使用以下准则来选择编码格式,按照列出的优先级顺序(从高到低):

1) 选择您的工具链支持的编码。这比以前容易多了。即使在最近的记忆中,许多编译器和语言基本上只支持 ASCII,这或多或少迫使开发人员使用西欧语言进行编码。如今,许多较新的语言都支持其他编码,并且几乎所有不错的编辑器和 IDE 都支持非常长的编码列表。不过……在确定编码之前,仍有足够的保留,您需要仔细检查。

2) 选择一种支持尽可能多的您希望使用的字母的编码。我将其作为次要优先事项,因为坦率地说,如果您的工具不支持它,那么您是否更喜欢这种编码并不重要。

在当今世界的许多情况下,UTF-8 都是一个绝佳的选择。这是一种丑陋、不优雅的格式,但它解决了一系列破坏其他编码的问题(即处理遗留代码),并且它似乎越来越成为字符编码的事实上的标准。它支持所有主要的字母表,现在地球上几乎每个编辑器都支持它,并且许多语言/编译器也支持它。但正如我上面提到的,有足够遗留的保留,您需要从头到尾仔细检查您的工具链,然后再最终决定。

I don't think there's really a straight yes or no answer to this question. I would say that the following guidelines should be used to pick an encoding format, in order of priority listed (highest to lowest):

1) Pick an encoding your tool chain supports. This is a lot easier than it used to be. Even in recent memory a lot of compilers and languages basically only supported ASCII, which more or less forced developers into coding in Western European languages. These days, many of the newer languages support other encodings, and almost all decent editors and IDEs support a tremendously long list of encodings. Still... there are just enough holdouts that you need to double check before you settle on an encoding.

2) Pick an encoding that supports as many of the alphabets you wish to use as possible. I place this as a secondary priority because frankly, if your tools don't support it it doesn't really matter whether you like the encoding better or not.

UTF-8 is an excellent choice in many circumstances of today's world. It's an ugly, inelegant format, but it solves a whole host of problems (namely dealing with legacy code) that break other encodings, and it seems to becoming more and more the de facto standard of character encodings. It supports every major alphabet, darn near every editor on the planet supports it now, and a whole host of languages/compilers support it, too. But as I mentioned above, there are just enough legacy holdouts that you need to double check your tool chain from end to end before you settle on it definitively.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文