打印 utf-8 编码的字符串
我正在使用 BeautifulSoup 从 HTML 中提取一些文本,但我只是不知道如何将其正确打印到屏幕(或打印到文件)。
这是我的包含文本的类的样子:
class Thread(object):
def __init__(self, title, author, date, content = u""):
self.title = title
self.author = author
self.date = date
self.content = content
self.replies = []
def __unicode__(self):
s = u""
for k, v in self.__dict__.items():
s += u"%s = %s " % (k, v)
return s
def __repr__(self):
return repr(unicode(self))
__str__ = __repr__
当尝试打印 Thread
的实例时,这是我在控制台上看到的内容:
~/python-tests $ python test.py
u'date = 21:01 03/02/11 content = author = \u05d3"\u05e8 \u05d9\u05d5\u05e0\u05d9 \u05e1\u05d8\u05d0\u05e0\u05e6\'\u05e1\u05e7\u05d5 replies = [] title = \u05de\u05d1\u05e0\u05d4 \u05d4\u05de\u05d1\u05d7\u05df '
无论我尝试什么,我都无法获得我想要的输出(上面的文本应该是希伯来语)。我的最终目标是将 Thread 序列化到文件(使用 json 或 pickle)并能够将其读回。
我在 Ubuntu 10.10 上使用 Python 2.6.6 运行它。
I'm using BeautifulSoup to extract some text from an HTML but I just can't figure out how to print it properly to the screen (or to a file for that matter).
Here's how my class containing the text looks like:
class Thread(object):
def __init__(self, title, author, date, content = u""):
self.title = title
self.author = author
self.date = date
self.content = content
self.replies = []
def __unicode__(self):
s = u""
for k, v in self.__dict__.items():
s += u"%s = %s " % (k, v)
return s
def __repr__(self):
return repr(unicode(self))
__str__ = __repr__
When trying to print an instance of Thread
here's what I see on the console:
~/python-tests $ python test.py
u'date = 21:01 03/02/11 content = author = \u05d3"\u05e8 \u05d9\u05d5\u05e0\u05d9 \u05e1\u05d8\u05d0\u05e0\u05e6\'\u05e1\u05e7\u05d5 replies = [] title = \u05de\u05d1\u05e0\u05d4 \u05d4\u05de\u05d1\u05d7\u05df '
Whatever I try I cannot get the output I'd like (the above text should be Hebrew). My end goal is to serialize Thread
to a file (using json or pickle) and be able to read it back.
I'm running this with Python 2.6.6 on Ubuntu 10.10.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要将 Unicode 字符串输出到文件(或控制台),您需要选择文本编码。在 Python 中,默认文本编码是 ASCII,但要支持希伯来字符,您需要使用不同的编码,例如 UTF-8:
To output a Unicode string to a file (or the console) you need to choose a text encoding. In Python the default text encoding is ASCII, but to support Hebrew characters you need to use a different encoding, such as UTF-8:
@mark 答案的一个不错的替代方法是设置环境变量 PYTHONIOENCODING=UTF-8。
参见。 在 Python 中通过 sys.stdout 写入 unicode 字符串。
(确保在启动 Python 之前设置它,而不是在脚本中设置。)
A nice alternative to @mark's answer is to set the environment variable
PYTHONIOENCODING=UTF-8
.c.f. Writing unicode strings via sys.stdout in Python.
(Make sure to set it prior to starting Python not in the script.)