UTF-8 和 upper()

发布于 2024-08-22 08:22:17 字数 215 浏览 7 评论 0原文

我想使用内置函数（例如 upper() 和 Capitalize()）转换 UTF-8 字符串。

例如：

>>> mystring = "işğüı"
>>> print mystring.upper()
Işğüı  # should be İŞĞÜI instead.

我该如何解决这个问题？

原文

I want to transform UTF-8 strings using built-in functions such as upper() and capitalize().

For example:

>>> mystring = "işğüı"
>>> print mystring.upper()
Işğüı  # should be İŞĞÜI instead.

How can I fix this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

海风掠过北极光 2024-08-29 08:22:17

不要对编码字符串执行操作；首先解码为 unicode。

>>> mystring = "işğüı"
>>> print mystring.decode('utf-8').upper()
IŞĞÜI

Do not perform actions on encoded strings; decode to unicode first.

>>> mystring = "işğüı"
>>> print mystring.decode('utf-8').upper()
IŞĞÜI

回复收藏 0 原文

我ぃ本無心為│何有愛 2024-08-29 08:22:17

实际上，作为一般策略，最好在文本进入内存后始终将其保留为 Unicode：在输入时对其进行解码，并在需要输出时对其进行精确编码（如果输入时有特定的编码要求）和/或输入时间。

即使您不选择采用这种一般策略（您应该！），执行您所需的任务的唯一合理方法仍然是再次解码、处理、编码——永远不要处理编码形式。即：

mystring = "işğüı"
print mystring.decode('utf-8').upper().encode('utf-8')

假设您在分配和输出时仅限于编码字符串。（不幸的是，输出约束是现实的，赋值约束不是——只需执行 mystring = u"işğüı"，从一开始就使其成为 unicode，并至少保存 .decode 打电话！-)

It's actually best, as a general strategy, to always keep your text as Unicode once it's in memory: decode it at the moment it's input, and encode it exactly at the moment you need to output it, if there are specific encoding requirements at input and/or input times.

Even if you don't choose to adopt this general strategy (and you should!), the only sound way to perform the task you require is still to decode, process, encode again -- never to work on the encoded forms. I.e.:

mystring = "işğüı"
print mystring.decode('utf-8').upper().encode('utf-8')

assuming you're constrained to encoded strings at assignment and for output purposes. (The output constraint is unfortunately realistic, the assignment constraint isn't -- just do mystring = u"işğüı", making it unicode from the start, and save yourself at least the .decode call!-)

回复收藏 0 原文

~没有更多了~