python 中的字符串编码

发布于 2024-07-18 04:21:39 字数 295 浏览 18 评论 0原文

在Python中，字符串可以是unicode（utf-16和utf-8）和具有不同编码的单字节（cp1251、cp1252等）。是否可以检查编码字符串是什么？例如，

time.strftime( "%b" )

将返回文本名称为月份的字符串。在 MacOS 下，返回的字符串将为 utf-16，在英语本地的 Windows 下，它将是采用 ascii 编码的单字节，在非英语语言环境的 Windows 下，它将通过语言环境的代码页进行编码，例如 cp1251。我该如何处理这样的字符串？

原文

In python, strings may be unicode ( both utf-16 and utf-8 ) and single-byte with different encodings ( cp1251, cp1252 etc ). Is it possible to check what encoding string is? For example,

time.strftime( "%b" )

will return a string with text name of a month. Under MacOS returned string will be utf-16, under Windows with English local it will be single byte with ascii encoding, and under Windows with non-English locale it will be encoded via locale's codepage, for example cp1251. How can i handle such strings?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遇到 2024-07-25 04:21:39

字符串不存储任何编码信息，您只需在与 unicode 相互转换或打印到输出设备时指定一个编码信息：

import locale
lang, encoding = locale.getdefaultlocale()
mystring = u"blabla"
print mystring.encode(encoding)

UTF-8 不是 unicode，它是将 unicode 编码为单个字符字节字符串。

最佳实践是在 python 端的任何地方使用 unicode，使用 unicode 可逆编码（例如 UTF-8）存储字符串，并仅为用户输出转换为精美的语言环境。

Strings don't store any encoding information, you just have to specify one when you convert to/from unicode or print to an output device :

import locale
lang, encoding = locale.getdefaultlocale()
mystring = u"blabla"
print mystring.encode(encoding)

UTF-8 is not unicode, it's an encoding of unicode into single byte strings.

The best practice is to work with unicode everywhere on the python side, store your strings with an unicode reversible encoding such as UTF-8, and convert to fancy locales only for user output.

回复收藏 0 原文