Python 如何在输出中获取西里尔字母?

发布于 2024-10-02 00:17:57 字数 336 浏览 5 评论 0原文

我如何获得西里尔字母而不是 u'...

代码就像这样

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

打印

>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']

how do I get Cyrillic instead of u'...

the code is like this

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

prints

>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

四叶草在未来唯美盛开 2024-10-09 00:17:57

看起来 some_text 是一个 unicode 对象列表。当您打印这样的列表时,它会打印列表内元素的reprs。因此,请尝试:

print(u''.join(some_text))

join 方法连接 some_text 的元素,元素之间有一个空格 u''。结果是一个 unicode 对象。

It looks like some_text is a list of unicode objects. When you print such a list, it prints the reprs of the elements inside the list. So instead try:

print(u''.join(some_text))

The join method concatenates the elements of some_text, with an empty space, u'', in between the elements. The result is one unicode object.

蓝天白云 2024-10-09 00:17:57

我不清楚 some_text 来自哪里(你删掉了那段代码),所以我不知道为什么它打印为字符列表而不是字符串。

但您应该知道,默认情况下,当您将字符串打印到终端时,Python 会尝试将字符串编码为 ASCII。如果您希望它们在其他编码系统中进行编码,您可以明确地执行此操作:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб

It's not clear to me where some_text comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string.

But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. If you want them to be encoded in some other coding system, you can do that explicitly:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб
不必了 2024-10-09 00:17:57

u'\uNNNN' 是字符串文字 u'з' 的 ASCII 安全版本:

>>> print u'\u0437'
з

但是,只有当您的控制台支持您所使用的字符时,这才会正确显示。尝试打印。在西欧 Windows 安装的控制台上尝试上述操作失败:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

因为让 Windows 控制台输出 Unicode 很棘手,所以 Python 2 的 repr 函数始终选择 ASCII 安全文字版本。

您的 print 语句输出的是 repr 版本,而不是直接打印字符,因为您将它们放在字符列表而不是字符串中。如果您对列表中的每个成员进行 print 操作,您将直接获得字符输出,而不是表示为 u'...' 字符串文字。

u'\uNNNN' is the ASCII-safe version of the string literal u'з':

>>> print u'\u0437'
з

However this will only display right for you if your console supports the character you are trying to print. Trying the above on the console on a Western European Windows install fails:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

Because getting the Windows console to output Unicode is tricky, Python 2's repr function always opts for the ASCII-safe literal version.

Your print statement is outputting the repr version and not printing characters directly because you've got them inside a list of characters instead of a string. If you did print on each of the members of the list, you'd get the characters output directly and not represented as u'...' string literals.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文