python tokenize 中的错误?

发布于 2024-10-12 18:12:09 字数 1532 浏览 2 评论 0原文

为什么这个

if 1 \
and 0:
    pass

最简单的代码会在 tokenize/untokenize 循环中阻塞

import tokenize
import cStringIO

def tok_untok(src):
    f = cStringIO.StringIO(src)
    return tokenize.untokenize(tokenize.generate_tokens(f.readline))

src='''if 1 \\
and 0:
    pass
'''
print tok_untok(src)

它会抛出:

AssertionError:
File "/mnt/home/anushri/untitled-1.py", line 13, in <module>
  print tok_untok(src)
File "/mnt/home/anushri/untitled-1.py", line 6, in tok_untok
  tokenize.untokenize(tokenize.generate_tokens(f.readline))
File "/usr/lib/python2.6/tokenize.py", line 262, in untokenize
  return ut.untokenize(iterable)
File "/usr/lib/python2.6/tokenize.py", line 198, in untokenize
  self.add_whitespace(start)
File "/usr/lib/python2.6/tokenize.py", line 187, in add_whitespace
  assert row <= self.prev_row

是否有一种解决方法,无需修改要标记化的 src (似乎 \ 是罪魁祸首)

另一个失败的例子是如果没有换行符最后,例如 src='if 1:pass' 失败并出现相同的错误

解决方法: 但似乎使用 unkenize 不同的方式工作

def tok_untok(src):
    f = cStringIO.StringIO(src)
    tokens = [ t[:2] for t in tokenize.generate_tokens(f.readline)]
    return tokenize.untokenize(tokens)

,即不传回整个令牌元组,而只传回 t[:2]

尽管 python doc 表示跳过额外的参数

将令牌转换回 Python 源代码。可迭代对象必须返回 至少有两个元素的序列, 令牌类型和令牌字符串。 任何附加的序列元素是 被忽略。

Why would this

if 1 \
and 0:
    pass

simplest of code choke on tokenize/untokenize cycle

import tokenize
import cStringIO

def tok_untok(src):
    f = cStringIO.StringIO(src)
    return tokenize.untokenize(tokenize.generate_tokens(f.readline))

src='''if 1 \\
and 0:
    pass
'''
print tok_untok(src)

It throws:

AssertionError:
File "/mnt/home/anushri/untitled-1.py", line 13, in <module>
  print tok_untok(src)
File "/mnt/home/anushri/untitled-1.py", line 6, in tok_untok
  tokenize.untokenize(tokenize.generate_tokens(f.readline))
File "/usr/lib/python2.6/tokenize.py", line 262, in untokenize
  return ut.untokenize(iterable)
File "/usr/lib/python2.6/tokenize.py", line 198, in untokenize
  self.add_whitespace(start)
File "/usr/lib/python2.6/tokenize.py", line 187, in add_whitespace
  assert row <= self.prev_row

Is there a workaround without modifying the src to be tokenized (it seems \ is the culprit)

Another example where it fails is if no newline at end e.g. src='if 1:pass' fails with same error

Workaround:
But it seems using untokenize different way works

def tok_untok(src):
    f = cStringIO.StringIO(src)
    tokens = [ t[:2] for t in tokenize.generate_tokens(f.readline)]
    return tokenize.untokenize(tokens)

i.e. do not pass back whole token tuple but only t[:2]

though python doc says extra args are skipped

Converts tokens back into Python source code. The iterable must return
sequences with at least two elements,
the token type and the token string.
Any additional sequence elements are
ignored.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

尽揽少女心 2024-10-19 18:12:09

是的,这是一个已知错误,并且人们对比该问题所附补丁更干净的补丁感兴趣。为更好的 Python 做出贡献的最佳时机;)

Yes, it's a known bug and there is interest in a cleaner patch than the one attached to that issue. Perfect time to contribute to a better Python ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文