如何在 Python 2.6 中获得 Python isidentifer() 功能?
Python 3 有一个名为 str.isidentifier
的字符串方法
如何在 Python 2.6 中获得类似的功能,而不需要重写我自己的正则表达式等?
Python 3 has a string method called str.isidentifier
How can I get similar functionality in Python 2.6, short of rewriting my own regex, etc.?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
tokenize 模块定义了一个名为 Name 的正则表达式
the tokenize module defines a regexp called Name
无效标识符验证
该线程中的所有答案似乎都在验证中重复一个错误,该错误允许将不是有效标识符的字符串与字符串进行匹配。
其他答案中建议的正则表达式模式是从
tokenize.Name
构建的,它包含以下正则表达式模式[a-zA-Z_]\w*
(运行 python 2.7.15 ) 和 '$' 正则表达式锚点。请参考标识符和关键字的官方Python 3描述(其中也包含与 python 2 相关的段落)。
因此“foo\n”不应被视为有效标识符。
虽然有人可能会认为这段代码是有效的:
由于换行符确实是有效的 ASCII 字符,因此它不被视为字母。此外,以换行符结尾的标识符显然没有实际用途
str.isidentifier
函数还确认这是一个无效标识符:python3 解释器:
< code>$ 锚点与
\Z
锚点引用 官方 python2 正则表达式语法:
这会生成一个以换行符结尾的字符串进行匹配作为有效标识符:
正则表达式模式不应使用
$
锚点,而应使用\Z
锚点。再次引用一下:
现在正则表达式是有效的:
危险含义
请参阅卢克的回答是另一个例子,这种弱正则表达式匹配在其他情况下可能会产生更危险的影响。
进一步阅读
Python 3 添加了对非 ascii 标识符的支持,请参阅 PEP-3131 。
Invalid Identifier Validation
All of the answers in this thread seem to be repeating a mistake in the validation which allows strings that are not valid identifiers to be matched like ones.
The regex patterns suggested in the other answers are built from
tokenize.Name
which holds the following regex pattern[a-zA-Z_]\w*
(running python 2.7.15) and the '$' regex anchor.Please refer to the official python 3 description of the identifiers and keywords (which contains a paragraph that is relevant to python 2 as well).
thus 'foo\n' should not be considered as a valid identifier.
While one may argue that this code is functional:
As the newline character is indeed a valid ASCII character, it is not considered to be a letter. Further more, there is clearly no practical use of an identifer that ends with a newline character
The
str.isidentifier
function also confirms this is an invalid identifier:python3 interpreter:
The
$
anchor vs the\Z
anchorQuoting the official python2 Regular Expression syntax:
This results in a string that ends with a newline to match as a valid identifier:
The regex pattern should not use the
$
anchor but instead\Z
is the anchor that should be used.Quoting once again:
And now the regex is a valid one:
Dangerous Implications
See Luke's answer for another example how this kind of weak regex matching could potentially in other circumstances have more dangerous implications.
Further Reading
Python 3 added support for non-ascii identifiers see PEP-3131.
应该做得很好。据我所知,没有任何内置方法。
should do nicely. As far as I know there isn't any built-in method.
到目前为止很好的答案。我会这样写。
Good answers so far. I'd write it like this.
在Python中< 3.0 这很容易,因为标识符中不能包含 unicode 字符。那应该可以完成工作:
In Python < 3.0 this is quite easy, as you can't have unicode characters in identifiers. That should do the work:
我决定再次尝试一下,因为已经有一些很好的建议。我会尽力整合它们。以下内容可以保存为 Python 模块并直接从命令行运行。如果运行,它会测试该功能,因此可以证明是正确的(至少在文档演示该功能的范围内)。
I've decided to take another crack at this, since there have been several good suggestions. I'll try to consolidate them. The following can be saved as a Python module and run directly from the command-line. If run, it tests the function, so is provably correct (at least to the extent that the documentation demonstrates the capability).
我正在使用什么:
What I am using:
到目前为止提出的所有解决方案都不支持 Unicode,或者如果在 Python 3 上运行,则不允许第一个字符中包含数字。
编辑:建议的解决方案只能在 Python 2 上使用,在 Python3 上应该使用 isidentifier。这是一个应该在任何地方都适用的解决方案:
基本上,它测试某些内容是否由(至少 1)个字符(包括数字)组成,然后检查第一个字符是否不是数字。
All solutions proposed so far do not support Unicode or allow a number in the first char if run on Python 3.
Edit: the proposed solutions should only be used on Python 2, and on Python3
isidentifier
should be used. Here is a solution that should work anywhere:Basically, it tests whether something consists of (at least 1) characters (including numbers), and then it checks that the first char is not a number.