Python 转换 unicode 字符串并将其保存到列表中

发布于 2024-12-10 12:17:40 字数 245 浏览 0 评论 0原文

我需要将一系列名称(例如“Alam\xc3\xa9”)插入到列表中,然后必须将它们保存到 SQLite 数据库中。

我知道我可以通过 Tiping: 正确呈现这些名称:

print eval(repr(NAME)).decode("utf-8")

但我必须将它们插入到列表中,所以我不能使用 print

其他方式在没有打印的情况下执行此操作?

I need to insert a series of names (like 'Alam\xc3\xa9') into a list, and than I have to save them into a SQLite database.

I know that I can render these names correctly by tiping:

print eval(repr(NAME)).decode("utf-8")

But I have to insert them into a list, so I can't use the print

Other way for doing this without the print?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小嗲 2024-12-17 12:17:40

这里有很多很多的误解。

您引用的字符串不是 Unicode。它是一个字节字符串,以 UTF-8 编码。

您可以通过解码将其转换为 Unicode:

unicode_name = name.decode('utf-8')

当您将 unicode_name 的值打印到控制台时,您将看到

>>> unicode_name
u'Alam\xe9'
>>> print unicode_name
Alamé

以下两件事之一:在这里,您可以看到输入名称并按 Enter 键会显示 Unicode 代码点的表示形式。这与输入 print repr(unicode_name) 相同。但是,执行 print unicode_name 会打印实际字符 - 即在幕后,它将其编码为终端的正确编码,并打印结果。

但这都是无关紧要的,因为 Unicode 字符串只能在内部表示。一旦您想将其存储在数据库、文件或任何地方,您就需要对其进行编码。最有可能选择的编码是 UTF-8 - 这就是它最初的编码。

>>> name
'Alam\xc3\xa9'
>>> print name
Alamé

正如您所看到的,使用原始的未解码版本的名称,reprprint 再次显示代码和字符。因此,并不是说将其转换为 Unicode 实际上就使其成为“真正”正确的字符。

那么,如果想将其存储到数据库中该怎么办呢?没有什么。什么都没有。 Sqlite 接受 UTF-8 输入,并将其数据以 UTF-8 格式存储在磁盘上。因此,在数据库中存储 name 的原始值绝对不需要任何转换。

Lots and lots of misconceptions here.

The string you quote is not Unicode. It is a byte string, encoded in UTF-8.

You can convert it to Unicode by decoding it:

unicode_name = name.decode('utf-8')

When you print the value of unicode_name to the console, you will see one of two things:

>>> unicode_name
u'Alam\xe9'
>>> print unicode_name
Alamé

Here, you can see that just typing the name and pressing enter shows a representation of the Unicode code points. This is the same as typing print repr(unicode_name). However, doing print unicode_name prints the actual characters - ie behind the scenes, it encodes it to the correct encoding for your terminal, and prints the result.

But this is all irrelevant, because Unicode strings can only be represented internally. As soon as you want to store it in a database, or a file, or anywhere, you need to encode it. And the most likely encoding to choose is UTF-8 - which is what it was in originally.

>>> name
'Alam\xc3\xa9'
>>> print name
Alamé

As you can see, using the original non-decoded version of the name, repr and print once again show the codes and the characters. So it's not that converting it to Unicode actually makes it any more "really" the correct character.

So, what to do if you want to store it in a database? Nothing. Nothing at all. Sqlite accepts UTF-8 input, and stores its data in UTF-8 format on the disk. So there is absolutely no conversion needed to store the original value of name in the database.

寄居者 2024-12-17 12:17:40

您在寻找这样的东西吗?

[n.decode("utf-8") for n in ['Alam\xc3\xa9', 'Alam\xc3\xa9', 'Alam\xc3\xa9']]

Are you looking for something like this?

[n.decode("utf-8") for n in ['Alam\xc3\xa9', 'Alam\xc3\xa9', 'Alam\xc3\xa9']]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文