在MySql数据库中存储序列化对象
我有一个大的 php 对象,我想将其序列化并存储在 MySql 数据库中。表编码为 UTF-8
,保存序列化对象编码的列也是 UTF-8
。
问题是该对象包含包含法语字符的文本字符串。
例如:
Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande
当我序列化对象然后直接再次反序列化时,字符串将得到维护并且格式正确。
但是,当我将序列化的对象存储到 MySql 数据库中,然后再次检索它,然后将其反序列化时,字符串会变成这样:
Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande
当我将对象存储在数据库中时出现问题。
注意:
- 对象是使用 propel ORM 存储的。
- 列类型为
文本
。 - 该字符串被存储并从 html 文件中读取。
I have a big php object that I want to serialize and store in a MySql database. The table encoding is UTF-8
and the column to hold the serialized object encoding is also UTF-8
.
The problem is the object holds a text string containing French characters.
For example:
Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande
When I serialize the object then unserialize it again directly the string is maintained and is in correct format.
However, when I store the serialized object into a MySql database then retrieve it again then unserialize it the string becomes like this:
Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande
Something goes wrong when I store the object in the database.
Notes:
- The object is stored using propel ORM.
- The column type is
text
. - The string is stored and read from a html file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
serialize
创建的字符串是二进制字符串,它们没有特定的字符集编码,而只是一个字节“数组”(其中一个字节是 8 位,一个八位字节)。如果您现在采用这样的字符串并告诉数据库它是 LATIN-1 编码的,并且数据库将其存储到使用 UTF-8 编码的文本字段中,则数据库将透明地将编码从 LATIN-1 更改为 UTF-8。 UTF-8 是一种字符集编码,对于某些字符,每个字符使用多个字节,例如您在问题中给出的字符,如
é
。然后,字符
é
在数据库中存储为é
,这是é
的 UTF-8 字节序列。如果您现在从数据库获取数据而未指定所需的编码,则数据库将以 UTF-8 形式返回。
现在,
unserialize
出现了问题,因为二进制字符串已被修改,导致其无效。相反,您需要告诉数据库在存储序列化字符串时不应修改编码,例如通过选择正确的列类型和编码(二进制字段,BLOB - 二进制大对象MySQL 文档,另请参阅二进制类型Propel 文档) -或者- 当您从数据库中获取数据时,您会将字符集编码恢复为原始格式。第一种方法(二进制字段)更好,因为它正是您正在寻找的。
对于已经以错误格式存入数据库的数据,需要更正数据。为此,您首先需要找出应用了哪种重新编码,例如从哪个字符集到哪个字符集。我认为它是 LATIN-1 但不能保证。您需要检查当前应用程序数据和流程的编码才能找到答案。
找到后,将值从 UTF-8 编码回原始编码。
The strings created by
serialize
are binary strings, they don't have a specific charset encoding but are just an "array" of bytes (where-as one byte is 8bit, an octet).If you now take such a string and tell your database that it is LATIN-1 encoded and your database stores it into a text-field with UTF-8 encoding, the database will transparently change the encoding from LATIN-1 into UTF-8. UTF-8 is a charset encoding that uses more than one byte per character for some characters, for example those you give in your question like
é
.The character
é
is then stored asé
inside the database, which is the UTF-8 byte-sequence foré
.If you now fetch the data from the database without specifying in which encoding you need it, the database will return it as UTF-8.
Now
unserialize
has a problem because the binary string has been modfied in a way which makes it invalid.Instead you need to either tell your database that it should not modify the encoding when it stores the serialized string, e.g. by choosing the right column type and encoding (binary field, BLOB - Binary Large ObjectMySQL Docs, see as well Binary TypesPropel Docs) -or- when you fetch the data from the database you revert the charset-encoding back to the original format. The first approach (binary field) is better because it is exactly what you're looking for.
For the data that has been already stored into the database in a wrong format, you need to correct the data. To do that you first need to find out which re-encoding was applied, e.g. from which charset to which charset. I assume it's LATIN-1 but there is no guarantee. You need to review the encoding of your current application data and processes to find out.
After you've found out, encode the values back from UTF-8 to the original encoding.
确保到处使用utf-8 - 听起来你错过了一些东西。
就您而言,我认为您忘记为数据库连接设置正确的字符集(使用
SET NAMES
语句或 mysql_set_charset()) - 但如果没有看到你的代码就很难说(而且我不知道 propel)。以下是 chazomaticus 的引用,他在 UTF-8 贯穿始终,列出了您必须注意的所有要点:
请注意,您不需要使用 utf-8 - 重要的是到处使用相同的字符集,而与可能的字符集无关。但如果您无论如何都需要更改,请使用 utf-8。
make sure to use utf-8 everywhere - sounds like you missed something.
in your case, i think you've forgotten to set the correct charset for you database-connection (using a
SET NAMES
statement or mysql_set_charset()) - but thats hard to say without seeing your code (and i don't know propel).the following is a quote from chazomaticus, who has given a perfect answer in UTF-8 all the way through, listing all the points you have to take care of:
note that you don't need to use utf-8 - the important part is to use the same charset everywhere, independent of what charset that might be. but if you need to change things anyway, use utf-8.
我总是通过使用
base64_encode()
存储序列化数据。序列化数据有时会引起问题,但在使用它的 base64 值后,只保留简单的字符。
I'm always storing esrialized data via using
base64_encode()
.Serialized data is sometimes causing problems, but after using the base64-value of it, only simple characters remain.
我强烈建议您使用 json_encode 而不是序列化。有一天,您会发现自己尝试使用来自 PHP 之外的其他地方的数据,并将其存储在 JSON 中,使其在任何地方都可读;几乎所有语言都支持解码 JSON,并且是一个完善的标准。
关于到处使用 utf8 的答案是成立的! :-D
I strongly recommend you to use json_encode instead of serialize. Some day you will find yourself trying to use that data from another place that is not PHP and having it stored in JSON makes it readable everywhere; virtually every language supports decoding JSON and is a well stablished standard.
The answer about using utf8 everywhere holds! :-D