在MySql数据库中存储序列化对象

发布于 2025-01-01 06:19:01 字数 612 浏览 6 评论 0原文

我有一个大的 php 对象，我想将其序列化并存储在 MySql 数据库中。表编码为 UTF-8，保存序列化对象编码的列也是 UTF-8。

问题是该对象包含包含法语字符的文本字符串。

例如：

Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande

当我序列化对象然后直接再次反序列化时，字符串将得到维护并且格式正确。

但是，当我将序列化的对象存储到 MySql 数据库中，然后再次检索它，然后将其反序列化时，字符串会变成这样：

Merci d'avoir passÃ© commande avec Lovre. Voici le rÃ©capitulatif de votre commande

当我将对象存储在数据库中时出现问题。

注意：

对象是使用 propel ORM 存储的。
列类型为文本。
该字符串被存储并从 html 文件中读取。

原文

I have a big php object that I want to serialize and store in a MySql database. The table encoding is UTF-8 and the column to hold the serialized object encoding is also UTF-8.

The problem is the object holds a text string containing French characters.

For example:

Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande

When I serialize the object then unserialize it again directly the string is maintained and is in correct format.

However, when I store the serialized object into a MySql database then retrieve it again then unserialize it the string becomes like this:

Merci d'avoir passÃ© commande avec Lovre. Voici le rÃ©capitulatif de votre commande

Something goes wrong when I store the object in the database.

Notes:

The object is stored using propel ORM.
The column type is text.
The string is stored and read from a html file.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我爱人 2025-01-08 06:19:01

serialize 创建的字符串是二进制字符串，它们没有特定的字符集编码，而只是一个字节“数组”（其中一个字节是 8 位，一个八位字节）。

如果您现在采用这样的字符串并告诉数据库它是 LATIN-1 编码的，并且数据库将其存储到使用 UTF-8 编码的文本字段中，则数据库将透明地将编码从 LATIN-1 更改为 UTF-8。 UTF-8 是一种字符集编码，对于某些字符，每个字符使用多个字节，例如您在问题中给出的字符，如 é。

然后，字符 é 在数据库中存储为 é，这是 é 的 UTF-8 字节序列。

如果您现在从数据库获取数据而未指定所需的编码，则数据库将以 UTF-8 形式返回。

现在，unserialize 出现了问题，因为二进制字符串已被修改，导致其无效。

相反，您需要告诉数据库在存储序列化字符串时不应修改编码，例如通过选择正确的列类型和编码（二进制字段，BLOB - 二进制大对象^{MySQL 文档}，另请参阅二进制类型^{Propel 文档}） -或者- 当您从数据库中获取数据时，您会将字符集编码恢复为原始格式。第一种方法（二进制字段）更好，因为它正是您正在寻找的。

对于已经以错误格式存入数据库的数据，需要更正数据。为此，您首先需要找出应用了哪种重新编码，例如从哪个字符集到哪个字符集。我认为它是 LATIN-1 但不能保证。您需要检查当前应用程序数据和流程的编码才能找到答案。

找到后，将值从 UTF-8 编码回原始编码。

回复收藏 0 原文

旧夏天 2025-01-08 06:19:01

确保到处使用utf-8 - 听起来你错过了一些东西。

就您而言，我认为您忘记为数据库连接设置正确的字符集（使用 SET NAMES 语句或 mysql_set_charset()) - 但如果没有看到你的代码就很难说（而且我不知道 propel）。

以下是 chazomaticus 的引用，他在 UTF-8 贯穿始终，列出了您必须注意的所有要点：

存储：
指定utf8_unicode_ci（或
所有表上的等效）排序规则
和数据库中的文本列。
这使得MySQL物理存储和
以 UTF-8 格式检索本机值。
检索：
在 PHP 中，在任何 DB 包装器中
使用时，需要设置连接
字符集改为utf8。这样，MySQL 就可以
没有从其本机 UTF-8 进行转换
当它将数据交给 PHP 时。
*
请注意，如果您不使用数据库
包装器，你可能需要发出
告诉 MySQL 给你一个查询
结果为 UTF-8：SET NAMES 'utf8'
（一旦连接）。
交货：
你必须告诉 PHP 交付
向客户端提供正确的标头，因此
文本将被解释为 UTF-8。在
PHP，您可以使用default_charset
php.ini 选项，或手动发出
自己的 Content-Type 标头，其中
只是更多的工作，但有相同的
效果。
提交：
您希望将所有数据发送给您
浏览器采用 UTF-8。
不幸的是，唯一的方法是
可靠地做到这一点是添加
accept-charset 属性给你的所有
标签：.
注意
W3C HTML 规范规定
客户端“应该”默认发送
以任何形式返回到服务器
服务器提供的字符集，但这是
显然只是一个推荐，
因此需要明确
每个标记。
尽管如此，在这方面，你仍然会
想要验证每个提交的字符串
在尝试之前将其视为有效的 UTF-8
存储它或在任何地方使用它。 PHP 的
mb_check_encoding() 就可以了，
但你必须虔诚地使用它。
处理：
不幸的是，这很难
部分。你需要确保
每次处理 UTF-8 字符串时，
您可以安全地这样做。最简单的方法
这是通过广泛使用
PHP 的 mbstring 扩展。
PHP 的
默认情况下不是字符串操作
UTF-8 安全。有些事你
可以安全地处理普通的 PHP 字符串
操作（如串联），但是
对于大多数事情你应该使用
等效的 mbstring 函数。
致
知道你在做什么（阅读：不是混乱
起来），你真的需要了解 UTF-8
以及它如何在最低的情况下工作
可能的水平。查看以下任意一个
来自 utf8.com 的链接以获得一些好处
学习您需要的一切的资源
知道。
另外，我也有这样的感觉
应该在某个地方说，尽管
这似乎是显而易见的：每个 PHP 或 HTML
您将要提供的文件应该是
以有效的 UTF-8 编码。

请注意，您不需要使用 utf-8 - 重要的是到处使用相同的字符集，而与可能的字符集无关。但如果您无论如何都需要更改，请使用 utf-8。

make sure to use utf-8 everywhere - sounds like you missed something.

in your case, i think you've forgotten to set the correct charset for you database-connection (using a SET NAMES statement or mysql_set_charset()) - but thats hard to say without seeing your code (and i don't know propel).

the following is a quote from chazomaticus, who has given a perfect answer in UTF-8 all the way through, listing all the points you have to take care of:

Storage:
Specify utf8_unicode_ci (or
equivalent) collation on all tables
and text columns in your database.
This makes MySQL physically store and
retrieve values natively in UTF-8.
Retrieval:
In PHP, in whatever DB wrapper you
use, you'll need to set the connection
charset to utf8. This way, MySQL does
no conversion from its native UTF-8
when it hands data off to PHP.
*
Note that if you don't use a DB
wrapper, you'll probably have to issue
a query to tell MySQL to give you
results in UTF-8: SET NAMES 'utf8'
(as soon as you connect).
Delivery:
You've got to tell PHP to deliver
the proper headers to the client, so
text will be interpreted as UTF-8. In
PHP, you can use the default_charset
php.ini option, or manually issue the
Content-Type header yourself, which
is just more work but has the same
effect.
Submission:
You want all data sent to you by
browsers to be in UTF-8.
Unfortunately, the only way to
reliably do this is add the
accept-charset attribute to all your
<form> tags: <form ... accept-charset="UTF-8">.
Note
that the W3C HTML spec says that
clients "should" default to sending
forms back to the server in whatever
charset the server served, but this is
apparently only a recommendation,
hence the need for being explicit on
every single <form> tag.
Although, on that front, you'll still
want to verify every submitted string
as being valid UTF-8 before you try to
store it or use it anywhere. PHP's
mb_check_encoding() does the trick,
but you have to use it religiously.
Processing:
This is, unfortunately, the hard
part. You need to make sure that
every time you process a UTF-8 string,
you do so safely. Easiest way to do
this is by making extensive use of
PHP's mbstring extension.
PHP's
string operations are NOT by default
UTF-8 safe. There are some things you
can safely do with normal PHP string
operations (like concatenation), but
for most things you should use the
equivalent mbstring function.
To
know what you're doing (read: not mess
it up), you really need to know UTF-8
and how it works on the lowest
possible level. Check out any of the
links from utf8.com for some good
resources to learn everything you need
to know.
Also, I feel like this
should be said somewhere, even though
it may seem obvious: every PHP or HTML
file you'll be serving should be
encoded in valid UTF-8.

note that you don't need to use utf-8 - the important part is to use the same charset everywhere, independent of what charset that might be. but if you need to change things anyway, use utf-8.

回复收藏 0 原文