repr() 函数的最佳输出类型和编码实践？

发布于 2024-09-17 10:01:26 字数 996 浏览 13 评论 0原文

最近，我在 __repr__()、format() 和编码方面遇到了很多麻烦。 __repr__() 的输出应该编码还是 unicode 字符串？ Python 中 __repr__() 的结果是否有最佳编码？我想要输出的内容确实有非 ASCII 字符。

我使用 Python 2.x，并且想要编写可以轻松适应 Python 3 的代码。因此，该程序使用

# -*- coding: utf-8 -*-
from __future__ import unicode_literals, print_function  # The 'Hello' literal represents a Unicode object

以下一些其他问题一直困扰着我，我正在寻找解决这些问题的解决方案：

打印到 UTF-8 终端应该可以工作（我有 sys.stdout.encoding ） > 设置为 UTF-8，但如果其他情况也能工作那就最好了）。
将输出通过管道传输到文件（以 UTF-8 编码）应该可以工作（在本例中，sys.stdout.encoding 为 None）。
我的许多 __repr__() 函数的代码目前有许多 return ....encode('utf-8')，这很重。有没有什么东西又坚固又轻便？
在某些情况下，我什至有像 return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8') 这样的丑陋野兽，即对象的表示被解码，放入格式化字符串中，然后重新编码。我想避免这种复杂的转变。

为了编写能够很好地解决这些编码问题的简单 __repr__() 函数，您建议做什么？

原文

Lately, I've had lots of trouble with __repr__(), format(), and encodings. Should the output of __repr__() be encoded or be a unicode string? Is there a best encoding for the result of __repr__() in Python? What I want to output does have non-ASCII characters.

I use Python 2.x, and want to write code that can easily be adapted to Python 3. The program thus uses

# -*- coding: utf-8 -*-
from __future__ import unicode_literals, print_function  # The 'Hello' literal represents a Unicode object

Here are some additional problems that have been bothering me, and I'm looking for a solution that solves them:

Printing to an UTF-8 terminal should work (I have sys.stdout.encoding set to UTF-8, but it would be best if other cases worked too).
Piping the output to a file (encoded in UTF-8) should work (in this case, sys.stdout.encoding is None).
My code for many __repr__() functions currently has many return ….encode('utf-8'), and that's heavy. Is there anything robust and lighter?
In some cases, I even have ugly beasts like return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8'), i.e., the representation of objects is decoded, put into a formatting string, and then re-encoded. I would like to avoid such convoluted transformations.

What would you recommend to do in order to write simple __repr__() functions that behave nicely with respect to these encoding questions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泡沫很甜 2024-09-24 10:01:26

在Python2中，__repr__（和__str__）必须返回一个字符串对象，而不是一个
统一码对象。在Python3中，情况相反，__repr__和__str__
必须返回 unicode 对象，而不是字节（née string）对象：

class Foo(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}' 

class Bar(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}'.encode('utf8')

repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

在 Python2 中，你实际上没有选择。您必须选择一种编码
__repr__ 的返回值。

顺便问一下，您阅读过 PrintFails wiki 吗？可能不会直接回答
你的其他问题，但我确实发现它有助于阐明为什么某些
发生错误。

当使用 from __future__ import unicode_literals 时，

'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

可以更简单地编写为

str('<{}>').format(repr(x))

假设 str 在您的系统上编码为 utf-8 。

如果没有 from __future__ import unicode_literals，表达式可以写为：

'<{}>'.format(repr(x))

In Python2, __repr__ (and __str__) must return a string object, not a
unicode object. In Python3, the situation is reversed, __repr__ and __str__
must return unicode objects, not byte (née string) objects:

class Foo(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}' 

class Bar(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}'.encode('utf8')

repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

In Python2, you don't really have a choice. You have to pick an encoding for the
return value of __repr__.

By the way, have you read the PrintFails wiki? It may not directly answer
your other questions, but I did find it helpful in illuminating why certain
errors occur.

When using from __future__ import unicode_literals,

'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

can be more simply written as

str('<{}>').format(repr(x))

assuming str encodes to utf-8 on your system.

Without from __future__ import unicode_literals, the expression can be written as:

'<{}>'.format(repr(x))

回复收藏 0 原文

不弃不离 2024-09-24 10:01:26

我认为装饰器可以以合理的方式管理 __repr__ 不兼容性。这是我使用的：

from __future__ import unicode_literals, print_function
import sys

def force_encoded_string_output(func):

    if sys.version_info.major < 3:

        def _func(*args, **kwargs):
            return func(*args, **kwargs).encode(sys.stdout.encoding or 'utf-8')

        return _func

    else:
        return func


class MyDummyClass(object):

    @force_encoded_string_output
    def __repr__(self):
        return 'My Dummy Class! \N{WHITE SMILING FACE}'

I think a decorator can manage __repr__ incompatibilities in a sane way. Here's what i use:

from __future__ import unicode_literals, print_function
import sys

def force_encoded_string_output(func):

    if sys.version_info.major < 3:

        def _func(*args, **kwargs):
            return func(*args, **kwargs).encode(sys.stdout.encoding or 'utf-8')

        return _func

    else:
        return func


class MyDummyClass(object):

    @force_encoded_string_output
    def __repr__(self):
        return 'My Dummy Class! \N{WHITE SMILING FACE}'

回复收藏 0 原文

梦幻的味道 2024-09-24 10:01:26

我使用如下函数：

def stdout_encode(u, default='UTF8'):
    if sys.stdout.encoding:
        return u.encode(sys.stdout.encoding)
    return u.encode(default)

然后我的 __repr__ 函数如下所示：

def __repr__(self):
    return stdout_encode(u'<MyClass {0} {1}>'.format(self.abcd, self.efgh))

I use a function like the following:

def stdout_encode(u, default='UTF8'):
    if sys.stdout.encoding:
        return u.encode(sys.stdout.encoding)
    return u.encode(default)

Then my __repr__ functions look like this:

def __repr__(self):
    return stdout_encode(u'<MyClass {0} {1}>'.format(self.abcd, self.efgh))

回复收藏 0 原文

~没有更多了~

关于作者

不可一世的女人

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

repr() 函数的最佳输出类型和编码实践？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

娇女薄笑

biaggi

xiaolangfanhua

rivulet

我三岁

薆情海

友情链接

__repr__() 函数的最佳输出类型和编码实践？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

娇女薄笑

biaggi

xiaolangfanhua

rivulet

我三岁

薆情海

友情链接

repr() 函数的最佳输出类型和编码实践？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。