使用 unicode 符号作为 Java 标识符是个好主意吗?

发布于 2024-08-31 14:31:12 字数 198 浏览 8 评论 0原文

我有一段代码,如下所示:

double Δt = lastPollTime - pollTime;
double α = 1 - Math.exp(-Δt / τ);
average += α * (x - average);

在 Java 标识符中使用 unicode 字符到底有多糟糕?或者这是完全可以接受的?

I have a snippet of code that looks like this:

double Δt = lastPollTime - pollTime;
double α = 1 - Math.exp(-Δt / τ);
average += α * (x - average);

Just how bad an idea is it to use unicode characters in Java identifiers? Or is this perfectly acceptable?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

小猫一只 2024-09-07 14:31:12

由于各种原因,这是一个坏主意。

  • 很多人的键盘不支持这些字符。如果我要在 qwerty 键盘(或任何其他没有希腊字母的键盘)上维护该代码,我就必须一直复制并粘贴这些字符。

  • 有些人的编辑器或终端可能无法正确显示这些字符。例如,一些编辑器(不幸的是)仍然默认使用某些 ISO-8859(拉丁文)变体。 ASCII 仍然如此流行的主要原因是它几乎总是有效。

  • 即使可以正确渲染字符,也可能会造成混乱。 直接来自 Sun(强调我的):< /p>

    <块引用>

    具有相同外观的标识符可能会有所不同。例如,由单个字母 LATIN CAPITAL LETTER A (A, \u0041)、LATIN SMALL LETTER A (a, \u0061)、希腊大写字母字母 (A, \u0391)、西里尔小写字母 A (a, \u0430) 和数学粗体斜体小写 A (a, \ud835\udc82) 都不同。

    ...

    Unicode 复合字符与分解字符不同。例如,拉丁大写字母 A ACUTE (Á, \u00c1) 可以被视为与拉丁大写字母 A (A , \u0041) 排序时紧随其后的是 NON-SPACING ACUTE (´, \u0301),但它们的标识符不同。

    这绝不是一个虚构的问题:α(U+03b1 GREEK SMALL LETTER ALPHA)和 ⍺(U+237a APL FUNCTIONAL SYMBOL ALPHA)是不同的字符!

  • 无法判断哪些字符是有效的。您的代码中的字符可以工作,但是当我使用功能符号字母时,我的 Java 编译器抱怨“非法字符:\9082”。尽管函数符号在此代码中更合适。除了 询问 Character.isJavaIdentifierPart()

  • 即使您可以编译它,但所有 Java 虚拟机实现是否都经过 Unicode 标识符的严格测试似乎值得怀疑。如果这些字符仅用于方法范围内的变量,则它们应该被编译掉,但如果它们是类成员,它们最终也会出现在 .class 文件中,可能会因为错误而破坏您的程序JVM 实现。

It's a bad idea, for various reasons.

  • Many people's keyboards do not support these characters. If I were to maintain that code on a qwerty keyboard (or any other without Greek letters), I'd have to copy and paste those characters all the time.

  • Some people's editors or terminals might not display these characters properly. For example, some editors (unfortunately) still default to some ISO-8859 (Latin) variant. The main reason why ASCII is still so prevalent is that it nearly always works.

  • Even if the characters can be rendered properly, they may cause confusion. Straight from Sun (emphasis mine):

    Identifiers that have the same external appearance may yet be different. For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \u0041), LATIN SMALL LETTER A (a, \u0061), GREEK CAPITAL LETTER ALPHA (A, \u0391), CYRILLIC SMALL LETTER A (a, \u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a, \ud835\udc82) are all different.

    ...

    Unicode composite characters are different from the decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á, \u00c1) could be considered to be the same as a LATIN CAPITAL LETTER A (A, \u0041) immediately followed by a NON-SPACING ACUTE (´, \u0301) when sorting, but these are different in identifiers.

    This is in no way an imaginary problem: α (U+03b1 GREEK SMALL LETTER ALPHA) and ⍺ (U+237a APL FUNCTIONAL SYMBOL ALPHA) are different characters!

  • There is no way to tell which characters are valid. The characters from your code work, but when I use the FUNCTIONAL SYMBOL ALPHA my Java compiler complains about "illegal character: \9082". Even though the functional symbol would be more appropriate in this code. There seems to be no solid rule about which characters are acceptable, except asking Character.isJavaIdentifierPart().

  • Even though you may get it to compile, it seems doubtful that all Java virtual machine implementations have been rigorously tested with Unicode identifiers. If these characters are only used for variables in method scope, they should get compiled away, but if they are class members, they will end up in the .class file as well, possibly breaking your program on buggy JVM implementations.

风渺 2024-09-07 14:31:12

看起来不错,因为它使用了正确的符号,但是您的团队中有多少人知道这些符号的击键?

我会使用英文表示只是为了更容易输入。而其他人可能没有支持在他们的电脑上设置的这些符号的字符集。

looks good as it uses the correct symbols, but how many of your team will know the keystrokes for those symbols?

I would use an english representation just to make it easier to type. And others might not have a character set that supports those symbols set up on their pc.

贱人配狗天长地久 2024-09-07 14:31:12

该代码读起来很好,但维护起来很糟糕 - 我建议使用简单的英语标识符,如下所示:

double deltaTime = lastPollTime - pollTime;
double alpha = 1 - Math.exp(-delta....

That code is fine to read, but horrible to maintain - I suggest use plain English identifiers like so:

double deltaTime = lastPollTime - pollTime;
double alpha = 1 - Math.exp(-delta....
和影子一齐双人舞 2024-09-07 14:31:12

如果您的工作组可以接受,那么这是完全可以接受的。这里的许多答案都基于一个傲慢的假设,即每个人都用英语编程。如今,非英语程序员绝不是稀有的,而且他们的数量正在以越来越快的速度变得越来越少。当他们拥有一门非常好的语言可供使用时,为什么要限制自己只使用英语版本呢?

除了英语国家的傲慢之外,使用非英语标识符还有其他正当理由。例如,如果您正在编写数学包,如果您的目标是数学家同行,那么使用希腊语就可以。当每个人都可以理解“Δ”并且可能更快地键入它时,为什么人们应该在您的工作组中键入“delta”?几乎任何问题领域都会有自己的行话,有时该行话用拉丁字母以外的其他语言表达。你究竟为什么要尝试将所有内容都塞进 ASCII 中?

It is perfectly acceptable if it is acceptable in your working group. A lot of the answers here operate on the arrogant assumption that everybody programs in English. Non-English programmers are by no means rare these days and they're getting less rare at an accelerating rate. Why should they restrict themselves to English versions when they have a perfectly good language at their disposal?

Anglophone arrogance aside, there are other legitimate reasons for using non-English identifiers. If you're writing mathematics packages, for example, using Greek is fine if your target is fellow mathematicians. Why should people type out "delta" in your workgroup when everybody can understand "Δ" and likely type it more quickly? Almost any problem domain will have its own jargon and sometimes that jargon is expressed in something other than the Latin alphabet. Why on Earth would you want to try and jam everything into ASCII?

蹲墙角沉默 2024-09-07 14:31:12

这是个好主意。诚实的。这在当时并不容易实现。。让我们为将来保留一个参考。我喜欢看到三角形、圆形、正方形等......作为程序代码的一部分。但现在,请尝试按照克罗津建议的方式重写它。

It's an excellent idea. Honest. It's just not easily practicable at the time. Let's keep a reference to it for the future. I would love to see triangles, circles, squares, etc... as part of program code. But for now, please do try to re-write it, the way Crozin suggests.

夏日落 2024-09-07 14:31:12

为什么不呢?
如果编写该代码的人员可以轻松地键入这些代码,那么这是可以接受的。

但愿上帝帮助那些无法显示 unicode 或无法输入 unicode 的人。

Why not?
If the people working on that code can type those easily, it's acceptable.

But god help those who can't display unicode, or who can't type them.

瘫痪情歌 2024-09-07 14:31:12

在完美的世界中,这将是推荐的方式。

不幸的是,当移动到普通 7 位 ASCII 字符之外时(UTF-8 不同于 ISO-Latin-1 不同于 UTF-16 等),您会遇到字符编码,这意味着您最终会遇到问题。我从 Windows 迁移到 Linux 时就遇到过这种情况。我们的斯堪的纳维亚民族字符在这个过程中破裂了,但幸运的是只是在字符串中。然后我们对所有这些使用 \u 编码。

如果您可以绝对确定您永远不会遇到这样的事情 - 例如,如果您的文件包含正确的 BOM - 那么请务必执行此操作。它将使您的代码更具可读性。如果至少有最小的怀疑,那就不要。

(请注意,“使用非英语语言”是另一回事。我只是在考虑使用符号而不是字母)。

In a perfect world, this would be the recommended way.

Unfortunately you run into character encodings when moving outside of plain 7-bit ASCII characters (UTF-8 is different from ISO-Latin-1 is different from UTF-16 etc), meaning that you eventually will run into problems. This has happened to me when moving from Windows to Linux. Our national scandinavian characters broke in the process, but fortunately was only in strings. We then used the \u encoding for all those.

If you can be absolutely certain that you will never, ever run into such a thing - for instance if your files contain a proper BOM - then by all means, do this. It will make your code more readable. If at least the smallest amount of doubt, then don't.

(Please note that the "use non-English languages" is a different matter. I'm just thinking in using symbols instead of letters).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文