如何在 Python 文档测试中包含 unicode 字符串？

发布于 2024-08-11 07:21:23 字数 790 浏览 6 评论 0原文

我正在编写一些必须操作 unicode 字符串的代码。我正在尝试为其编写文档测试，但遇到了麻烦。以下是说明该问题的最小示例：

# -*- coding: utf-8 -*-
def mylen(word):
  """
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

首先，我们运行代码以查看 print mylen(u"áéíóú") 的预期输出。

$ python mylen.py
5

接下来，我们对其运行 doctest 来查看问题。

$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
    mylen(u"áéíóú")
Expected:
    5
Got:
    10
**********************************************************************
1 items had failures:
   1 of   1 in mylen.mylen
***Test Failed*** 1 failures.

那么我如何测试 mylen(u"áéíóú") 的计算结果是否为 5？

原文

I am working on some code that has to manipulate unicode strings. I am trying to write doctests for it, but am having trouble. The following is a minimal example that illustrates the problem:

# -*- coding: utf-8 -*-
def mylen(word):
  """
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

First we run the code to see the expected output of print mylen(u"áéíóú").

$ python mylen.py
5

Next, we run doctest on it to see the problem.

$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
    mylen(u"áéíóú")
Expected:
    5
Got:
    10
**********************************************************************
1 items had failures:
   1 of   1 in mylen.mylen
***Test Failed*** 1 failures.

How then can I test that mylen(u"áéíóú") evaluates to 5?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伏妖词 2024-08-18 07:21:23

如果你想要 unicode 字符串，你必须使用 unicode 文档字符串！注意u！

# -*- coding: utf-8 -*-
def mylen(word):
  u"""        <----- SEE 'u' HERE
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

只要测试通过，这就会起作用。对于 Python 2.x，您需要另一个技巧来使详细的 doctest 模式正常工作或在测试失败时获得正确的回溯：

if __name__ == "__main__":
    import sys
    reload(sys)
    sys.setdefaultencoding("UTF-8")
    import doctest
    doctest.testmod()

注意！仅将 setdefaultencoding 用于调试目的。我会接受它用于文档测试，但不会在生产代码中的任何地方使用。

If you want unicode strings, you have to use unicode docstrings! Mind the u!

# -*- coding: utf-8 -*-
def mylen(word):
  u"""        <----- SEE 'u' HERE
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

This will work -- as long as the tests pass. For Python 2.x you need yet another hack to make verbose doctest mode work or get correct tracebacks when tests fail:

if __name__ == "__main__":
    import sys
    reload(sys)
    sys.setdefaultencoding("UTF-8")
    import doctest
    doctest.testmod()

NB! Only ever use setdefaultencoding for debug purposes. I'd accept it for doctest use, but not anywhere in your production code.

回复收藏 0 原文

秋千易 2024-08-18 07:21:23

Python 2.6.6 不能很好地理解 unicode 输出，但是可以使用以下方法修复：

已经描述了 hack with sys.setdefaultencoding("UTF-8")
unicode docstring （上面也已经提到了，谢谢很多）
AND print 语句。

在我的例子中，这个文档字符串告诉我们测试被破坏了：

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    u'erg/(cm² sec)'
    '''

带有“错误”消息

Failed example:
    beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
Expected:
    u'erg/(cm² sec)'
Got:
    u'erg/(cm\xb2 sec)'

使用 print 我们可以修复这个问题：

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> print beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    erg/(cm² sec)
    '''

Python 2.6.6 doesn't understand unicode output very well, but this can be fixed using:

already described hack with sys.setdefaultencoding("UTF-8")
unicode docstring (already mentioned above too, thanks a lot)
AND print statement.

In my case this docstring tells that test is broken:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    u'erg/(cm² sec)'
    '''

with "error" message

Failed example:
    beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
Expected:
    u'erg/(cm² sec)'
Got:
    u'erg/(cm\xb2 sec)'

Using print we can fix that:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> print beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    erg/(cm² sec)
    '''

回复收藏 0 原文

阳光下的泡沫是彩色的 2024-08-18 07:21:23

这似乎是 Python 中一个已知且尚未解决的问题。请参阅此处和此处。

毫不奇怪，它可以修改为在 Python 3 中正常工作，因为所有字符串都是 Unicode：

def mylen(word):
  """
  >>> mylen("áéíóú")
  5
  """
  return len(word)

print(mylen("áéíóú"))

This appears to be a known and as yet unresolved issue in Python. See open issues here and here.

Not surprisingly, it can be modified to work OK in Python 3 since all strings are Unicode there:

def mylen(word):
  """
  >>> mylen("áéíóú")
  5
  """
  return len(word)

print(mylen("áéíóú"))

回复收藏 0 原文

眉目亦如画i 2024-08-18 07:21:23

我的解决方案是转义 unicode 字符，例如 u'\xe1\xe9\xed\xf3\xfa'。虽然不太容易阅读，但我的测试只有一些非 ASCII 字符，因此在这些情况下，我将描述放在一边作为注释，例如“# n with tilde”。

回复收藏 0 原文

淡水深流 2024-08-18 07:21:23

正如已经提到的，您需要确保您的文档字符串是 Unicode。

如果您可以切换到 Python 3，那么它会自动工作，因为源编码已经是 utf-8 并且默认字符串类型是 Unicode。

要在 Python 2 中实现相同的目的，您需要保留 coding: utf-8 ，您可以在其旁边为所有文档字符串添加 u 前缀，或者简单地添加

from __future__ import unicode_literals

As already mentioned, you need to ensure your docstrings are Unicode.

If you can switch to Python 3, then it would work automatically there, as both the source encoding is already utf-8 and the default string type is Unicode.

To achieve the same in Python 2, you need to keep the coding: utf-8 next to which you can either prefix all docstrings with u, or simply add