python tokenize 中的错误?
为什么这个
if 1 \
and 0:
pass
最简单的代码会在 tokenize/untokenize 循环中阻塞
import tokenize
import cStringIO
def tok_untok(src):
f = cStringIO.StringIO(src)
return tokenize.untokenize(tokenize.generate_tokens(f.readline))
src='''if 1 \\
and 0:
pass
'''
print tok_untok(src)
它会抛出:
AssertionError:
File "/mnt/home/anushri/untitled-1.py", line 13, in <module>
print tok_untok(src)
File "/mnt/home/anushri/untitled-1.py", line 6, in tok_untok
tokenize.untokenize(tokenize.generate_tokens(f.readline))
File "/usr/lib/python2.6/tokenize.py", line 262, in untokenize
return ut.untokenize(iterable)
File "/usr/lib/python2.6/tokenize.py", line 198, in untokenize
self.add_whitespace(start)
File "/usr/lib/python2.6/tokenize.py", line 187, in add_whitespace
assert row <= self.prev_row
是否有一种解决方法,无需修改要标记化的 src (似乎 \
是罪魁祸首)
另一个失败的例子是如果没有换行符最后,例如 src='if 1:pass'
失败并出现相同的错误
解决方法: 但似乎使用 unkenize 不同的方式工作
def tok_untok(src):
f = cStringIO.StringIO(src)
tokens = [ t[:2] for t in tokenize.generate_tokens(f.readline)]
return tokenize.untokenize(tokens)
,即不传回整个令牌元组,而只传回 t[:2]
尽管 python doc 表示跳过额外的参数
将令牌转换回 Python 源代码。可迭代对象必须返回 至少有两个元素的序列, 令牌类型和令牌字符串。 任何附加的序列元素是 被忽略。
Why would this
if 1 \
and 0:
pass
simplest of code choke on tokenize/untokenize cycle
import tokenize
import cStringIO
def tok_untok(src):
f = cStringIO.StringIO(src)
return tokenize.untokenize(tokenize.generate_tokens(f.readline))
src='''if 1 \\
and 0:
pass
'''
print tok_untok(src)
It throws:
AssertionError:
File "/mnt/home/anushri/untitled-1.py", line 13, in <module>
print tok_untok(src)
File "/mnt/home/anushri/untitled-1.py", line 6, in tok_untok
tokenize.untokenize(tokenize.generate_tokens(f.readline))
File "/usr/lib/python2.6/tokenize.py", line 262, in untokenize
return ut.untokenize(iterable)
File "/usr/lib/python2.6/tokenize.py", line 198, in untokenize
self.add_whitespace(start)
File "/usr/lib/python2.6/tokenize.py", line 187, in add_whitespace
assert row <= self.prev_row
Is there a workaround without modifying the src to be tokenized (it seems \
is the culprit)
Another example where it fails is if no newline at end e.g. src='if 1:pass'
fails with same error
Workaround:
But it seems using untokenize different way works
def tok_untok(src):
f = cStringIO.StringIO(src)
tokens = [ t[:2] for t in tokenize.generate_tokens(f.readline)]
return tokenize.untokenize(tokens)
i.e. do not pass back whole token tuple but only t[:2]
though python doc says extra args are skipped
Converts tokens back into Python source code. The iterable must return
sequences with at least two elements,
the token type and the token string.
Any additional sequence elements are
ignored.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,这是一个已知错误,并且人们对比该问题所附补丁更干净的补丁感兴趣。为更好的 Python 做出贡献的最佳时机;)
Yes, it's a known bug and there is interest in a cleaner patch than the one attached to that issue. Perfect time to contribute to a better Python ;)