用于非本地化目的的 Unicode 标识符(函数名称)是否可取?
无论如何,PHP 允许变量、函数、类和常量使用 Unicode 标识符。它当然是为了本地化应用程序而设计的。用英语之外的任何语言编写 API 是否是一个好主意尚有争议,但毫无疑问,某些开发环境可能需要这样做。
$Schüssel = new Müsli(T_FRÜCHTE);
但 PHP 不仅仅允许使用 \p{L}
作为标识符。您几乎可以使用任何 Unicode 字符,除了 ASCII 范围内的字符(例如 :
是特殊字符或 \
,因为它已用作支持命名空间的内部 hack。)
不管怎样,你可以这样做,而且我什至认为这对于有趣的项目来说是可行的:
throw new ಠ_ಠ("told you about the disk space before");
但是除了本地化、娱乐和装饰效果之外,哪些 Unicode 标识符的使用是可取的?
例如,我正在考虑将参数嵌入到魔术方法名称中。就我而言,我只需要注入数字参数,因此只需使用下划线即可:
$what->substr_0_50->ascii("text");
// (Let's skip the evilness discussion this time. Not quite sure
// yet if I really want it, but the conciseness might make sense.)
但如果我想嵌入其他文本参数,我将需要另一个 unicode 字符。现在打字更难了,但如果有一个可以提高可读性并传达含义......?
->substr✉0✉50-> // doesn't look good
因此,这种情况下的问题是:哪个符号作为虚拟函数名称中混合参数的分隔符有意义。 -- 更广泛的元主题:您知道 Unicode 标识符的哪些用途,或者您认为还可以吗?
PHP allows Unicode identifiers for variables, functions, classes and constants anyhow. It was certainly intended for localized applications. Wether it's a good idea to code an API in anything but English is debatable, but it's undisputed that some development settings could demand it.
$Schüssel = new Müsli(T_FRÜCHTE);
But PHP allows more than just \p{L}
for identifiers. You can use virtually any Unicode character, except those from the ASCII range (e.g. :
is special or \
as that's already used as internal hack to support namespaces.)
Anyway, you could do so, and I would even consider that a workable use for fun projects:
throw new ಠ_ಠ("told you about the disk space before");
But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?
For example I'm pondering this for embedding parameters into magic method names. In my case I only need to inject numeric parameters, so would get away with just the underscore:
$what->substr_0_50->ascii("text");
// (Let's skip the evilness discussion this time. Not quite sure
// yet if I really want it, but the conciseness might make sense.)
But if I wanted to embed other textual parameters, I would require another unicode character. Now that's harder to type, but if there's one that would aid readability and convey the meaning ... ?
->substr✉0✉50-> // doesn't look good
So, the question in this case: Which symbol makes sense as separator for mixed-in parameters in a virtual function name. -- Broader meta topic: Which uses of Unicode identifiers do you know about, or would you consider okayish?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只是为了澄清:PHP 不支持 Unicode。而且它不支持 Unicode 标签。更准确地说,PHP 将
LABEL
定义为[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
。正如您在这里所看到的,除了典型的字母数字+下划线之外,它只允许使用一小部分字符。您的 Unicode 标签仍然被接受的事实只是 PHP 不支持 Unicode 的事实造成的。您的特殊字符在 UTF-8 中是几个字节长,PHP 将这些字节中的每一个都视为单独的字符,并且意外 - 使用您尝试的字符 - 每个字符都与\x7f- 匹配上面提到的 \xff
范围。关于该主题的进一步阅读: 方法的奇异名称、常量、变量和字段 - 错误还是功能?
Just to make it clear: PHP does not support Unicode. And it doesn't support Unicode labels. To be more precise PHP defines a
LABEL
as[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
. As you can see here, it allows only a small range of characters apart from the typical alphanumeric + underscore. The fact that your Unicode labels are still accepted is only an artifact from the fact, that PHP doesn't have Unicode support. Your special characters are several bytes long in UTF-8 and PHP treats each of these bytes as a separate character and accidentally - with the characters you tried - each of them matched with the\x7f-\xff
range mentioned above.Further reading on that topic: Exotic names for methods, constants, variables and fields - Bug or Feature?
\u2639
?字体支持之后最大的障碍是使字符成为可以键入的字符。在宏或复制/粘贴之外,输入 Unicode 字符并不是特别容易。强迫其他人这样做很可能会违反“假设在你是知道你住在哪里的凶残的精神病患者之后使用你的代码的人”规则。
我们只在代码库中的少数注释中使用 unicode 字符,
我认为这属于“娱乐和装饰”类别。或者“屠杀 php-internals 团队后开枪自杀”类别。选择一个。
不管怎样,这不是一个好主意,因为它会让你的代码难以修改。
\u2639
?The biggest hurdle after font support is going to be making the character one that can be typed. Outside of a macro or copy/paste, unicode characters are not spectacularly easy to enter. Forcing this upon others is very likely going to violate the "assume the people that work with your code after you are murderous psychopaths that know where you live" rule.
We use unicode characters in only a few comments in our codebase, like
I think that falls into the "amusement and decorative" category. Or the "shoot self in head after slaughtering the php-internals team" category. Pick one.
Anyway, this is not a good idea because it's going to make your code hard to modify.