标记化模块
请帮忙
模块 tokenize 中有很多标记,如 STRING、BACKQUOTE、AMPEREQUAL 等。
>>> import cStringIO
>>> import tokenize
>>> source = "{'test':'123','hehe':['hooray',0x10]}"
>>> src = cStringIO.StringIO(source).readline
>>> src = tokenize.generate_tokens(src)
>>> src
<generator object at 0x00BFBEE0>
>>> src.next()
(51, '{', (1, 0), (1, 1), "{'test':'123','hehe':['hooray',0x10]}")
>>> token = src.next()
>>> token
(3, "'test'", (1, 1), (1, 7), "{'test':'123','hehe':['hooray',0x10]}")
>>> token[0]
3
>>> tokenize.STRING
3
>>> tokenize.AMPER
19
>>> tokenize.AMPEREQUAL
42
>>> tokenize.AT
50
>>> tokenize.BACKQUOTE
25
这是我实验过的。但我无法找到它们的含义?
从哪里我会明白这一点。我需要一个立即的解决方案。
Please help
There are many tokens in module tokenize like STRING,BACKQUOTE,AMPEREQUAL etc.
>>> import cStringIO
>>> import tokenize
>>> source = "{'test':'123','hehe':['hooray',0x10]}"
>>> src = cStringIO.StringIO(source).readline
>>> src = tokenize.generate_tokens(src)
>>> src
<generator object at 0x00BFBEE0>
>>> src.next()
(51, '{', (1, 0), (1, 1), "{'test':'123','hehe':['hooray',0x10]}")
>>> token = src.next()
>>> token
(3, "'test'", (1, 1), (1, 7), "{'test':'123','hehe':['hooray',0x10]}")
>>> token[0]
3
>>> tokenize.STRING
3
>>> tokenize.AMPER
19
>>> tokenize.AMPEREQUAL
42
>>> tokenize.AT
50
>>> tokenize.BACKQUOTE
25
This is what i experimented.But i was not able to find what they mean ?
From where i will understand this.I need an immediate solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
各种 AMPER、BACKQUOTE 等值对应于 python 标记/运算符的相应符号的标记编号。 即 AMPER = & (与符号),AMPEREQUAL =“&=”。
然而,您实际上不必关心这些。 它们由内部 C 标记器使用,但 python 包装器简化了输出,将所有运算符符号转换为 OP 标记。 您可以使用令牌模块的 tok_name 字典将符号令牌 id(每个令牌元组中的第一个值)转换为符号名称。 例如:
作为更好地描述令牌的快速调试语句,您还可以使用 tokenize.printtoken。 这是没有文档记录的,并且看起来它不存在于 python3 中,因此不要依赖它来生成生产代码,但是快速浏览一下标记的含义,您可能会发现它很有用
:返回的每个标记依次为:
The various AMPER, BACKQUOTE etc values correspond to the token number of the appropriate symbol for python tokens / operators. ie AMPER = & (ampersand), AMPEREQUAL = "&=".
However, you don't actually have to care about these. They're used by the internal C tokeniser, but the python wrapper simplifies the output, translating all operator symbols to the
OP
token. You can translate the symbolic token ids (the first value in each token tuple) to the symbolic name using the token module's tok_name dictionary. For example:As a quick debug statement to describe the tokens a bit better, you could also use tokenize.printtoken. This is undocumented, and looks like it isn't present in python3, so don't rely on it for production code, but as a quick peek at what the tokens mean, you may find it useful:
The various values in the tuple you get back for each token are, in order:
您需要阅读 python 的代码 tokenizer .c 了解细节。
只需搜索您想了解的关键字即可。 应该不难。
You will need to read python's code tokenizer.c to understand the detail.
Just search the keyword you want to know. Should be not hard.
Python 的词法分析(包括标记)记录在 http://docs.python.org/reference/ lexical_analysis.html 。 作为 http://docs.python.org/library/token.html#module -token 表示:“请参阅 Python 发行版中的文件 Grammar/Grammar,了解语言语法上下文中名称的定义。”。
Python's lexical analysis (including tokens) is documented at http://docs.python.org/reference/lexical_analysis.html . As http://docs.python.org/library/token.html#module-token says, "Refer to the file Grammar/Grammar in the Python distribution for the definitions of the names in the context of the language grammar.".