当前位置：文江博客话题详情

如何在Python中显示非英文字符？

发布于 2024-12-18 09:59:17 字数 81 浏览 0 评论 0 原文

我有一个 python 字典，其中包含具有非英语字符的项目。当我打印字典时，python shell 无法正确显示非英语字符。我该如何解决这个问题？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟酒忠诚 2024-12-25 09:59:17

当您的应用程序打印 hei\xdfen 而不是 heißen 时，这意味着您实际上并未打印实际的 unicode 字符串，而是在 unicode 对象的字符串表示形式上打印。

让我们假设您的字符串（“heißen”）存储在名为 text 的变量中。只是为了确定您所处的位置，请通过调用检查此变量的类型：

>>> type(text)

如果您得到，则意味着您没有处理字符串，而是处理字符串一个 unicode 对象。

如果您执行直观的操作并尝试通过调用 print(text) 打印到文本，您将不会得到实际文本（“heißen”），而是得到一个字符串表示形式unicode 对象。

要解决此问题，您需要知道您的终端具有哪种编码，并打印出根据给定编码编码的 unicode 对象。

例如，如果您的终端使用 UTF-8 编码，您可以通过调用来打印字符串：

text.encode('utf-8')

这是基本概念。现在让我给你一个更详细的例子。假设我们有一个存储字典的源代码文件。例如：

mydict = {'heiße': 'heiße', 'äää': 'ööö'}

当您输入 print mydict 时，您将得到 {'\xc3\xa4\xc3\xa4\xc3\xa4': '\xc3\xb6\xc3\xb6\xc3\xb6' , 'hei\xc3\x9fe': 'hei\xc3\x9fe'}.甚至 print mydict['äää'] 也不起作用：它会产生类似于 ├Â├Â├Â 的结果。通过尝试 print type(mydict['äää']) 可以揭示问题的本质，它会告诉您正在处理一个 string 对象。

为了解决这个问题，您首先需要将源代码文件的字符集的字符串表示形式解码为 unicode 对象，然后在终端的字符集中表示它。对于单个字典项，可以通过以下方式实现：

print unicode(mydict, 'utf-8')

请注意，如果默认编码不适用于您的终端，则需要编写：

print unicode(mydict, 'utf-8').encode('utf-8')

其中外部编码方法根据您的终端指定编码。

我真的强烈建议您阅读 Joel 的 “每个软件开发人员绝对、肯定必须了解 Unicode 的绝对最低限度”和字符集（没有任何借口！）”。除非您了解字符集的工作原理，否则您将一次又一次地遇到类似的问题。

When your application prints hei\xdfen instead of heißen, it means you are not actually printing the actual unicode string, but instead, on the string representation of the unicode object.

Let us assume your string ("heißen") is stored into variable called text. Just to make sure where you are at, check out the type of this variable by calling:

>>> type(text)

If you get <type 'unicode'>, it means you are not dealing with a string, but instead a unicode object.

If you do the intuive thing and try to print to text by invoking print(text) you won't get out the actual text ("heißen") but instead, a string representation of a unicode object.

To fix this, you need to know which encoding your terminal has and print out your unicode object encoded according to the given encoding.

For instance, if your terminal uses UTF-8 encoding, you can print out a string by invoking:

text.encode('utf-8')

That's for the basic concepts. Now let me give you a more detailed example. Let us assume we have a source code file storing your dictionary. Like:

mydict = {'heiße': 'heiße', 'äää': 'ööö'}

When you type print mydict you will get {'\xc3\xa4\xc3\xa4\xc3\xa4': '\xc3\xb6\xc3\xb6\xc3\xb6', 'hei\xc3\x9fe': 'hei\xc3\x9fe'}. Even print mydict['äää'] doesn't work: it results in something like ├Â├Â├Â. The nature of the problem is revealed by trying out print type(mydict['äää']) which will tell you that you are dealing with a string object.

In order to fix the problem, you first need to decode the string representation from your source code file's charset to unicode object and then represent it in the charset of your terminal. For individual dict items this can be achived by:

print unicode(mydict, 'utf-8')

Note that if default encoding doesn't apply to your terminal, you need to write:

print unicode(mydict, 'utf-8').encode('utf-8')

Where the outer encode method specifies the encoding according to your terminal.

I really really urge you to read through Joel's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)". Unless you understand how character sets work, you will stumble across problems similar to this again and again.

回复收藏 0 原文

拥抱我好吗 2024-12-25 09:59:17

实际上，这并不是一个与 Python 相关的问题。

您的环境变量（我假设您使用的是 Linux 或 Mac）应该启用 UTF-8 字符编码。

您应该能够将它们放入 ~/.profile （或 ~/.bashrc）文件中：

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

-edit-

实际上，Mac 默认使用 UTF-8。这是一个 Windows/Linux 问题。

-edit 2-

当然，您应该始终使用 unicode 字符串、unicode 编辑器和 unicode 文档类型。但我假设你知道这一点:-)

Actually, that's not really a Python-related issue.

Your environment variables (I'm assuming that you're on either Linux or Mac) should have the UTF-8 character encoding active.

You should be able to put these in your ~/.profile (or ~/.bashrc) file :

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

-edit-

Actually, Mac uses UTF-8 by default. This is a Windows/Linux issue.

-edit 2-

You should, of course, always use unicode strings, a unicode editor and a unicode doctype. But I'm assuming that you know that :-)

回复收藏 0 原文

疑心病 2024-12-25 09:59:17

Python 3.0 有默认的 unicode 字符串，而在 python 2.x 中，您必须在字符串前面加上 u 前缀

u"汉字/漢字 chinese"

Python 3.0 have default unicode strings and in python 2.x you have to prefix string whit u

u"汉字/漢字 chinese"

回复收藏 0 原文

安人多梦 2024-12-25 09:59:17

在 python 终端中，

    >>> "heißen"
    is equivalent to
    >>> print repr("heißen")

有关 python 2 中的 repr 的 Python 文档
http://docs.python.org/2/library/functions.html #func-repr
很稀缺。

可以看出，两者都为您提供了字节串“heißen”的“基于字节”表示，其中所有大于 127 的字节都经过 \x 编码。从这里你得到

    'hei\xc3\x9fen'

unicode 的 repr() 并没有多大帮助。它正确地将“ß”显示为单个 unincode cherecter“\xdf”，但仍然不可读。

我发现的实用解决方案是使用 python 3。

http://docs.python.org /3/library/functions.html#repr

该页面还说明了

    ascii(object)
    As repr(), return a string containing a printable representation of an
    object, but escape the non-ASCII characters in the string returned by
    repr() using \x, \u or \U escapes. This generates a string similar to
    that returned by repr() in Python 2.

一些内容。

In python terminal,

    >>> "heißen"
    is equivalent to
    >>> print repr("heißen")

Python documentation on repr in python 2
http://docs.python.org/2/library/functions.html#func-repr
is scarse.

As can be seen, both give you 'byte-based' representation of byte-string "heißen", where all bytes, that are more then 127 are \x encoded. This is where from you get

    'hei\xc3\x9fen'

unicode's repr() is not much more helpful. It correctly shows 'ß' as a single unincode cherecter '\xdf', but is still unreadable.

Practical solution I found is to use python 3.

http://docs.python.org/3/library/functions.html#repr

the page also says

    ascii(object)
    As repr(), return a string containing a printable representation of an
    object, but escape the non-ASCII characters in the string returned by
    repr() using \x, \u or \U escapes. This generates a string similar to
    that returned by repr() in Python 2.

which explains things a little bit.

回复收藏 0 原文

~没有更多了~