这是将 MySQL 表从 latin1 转换为 utf-8 的安全方法吗?
我需要将我的一个数据库中的所有表从 latin1 更改为 utf-8 (使用 utf8_bin 排序规则)。
我已经转储了数据库,从中创建了一个测试数据库,并运行以下命令,每个表都没有任何错误或警告:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
我在真实数据库上重复此操作是否安全?经检查,数据似乎不错...
I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).
I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
Is it safe for me to repeat this on the real database? The data seems fine by inspection...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有 3 种不同的情况需要考虑:
这些值确实使用 Latin1 进行编码
这是一致的情况:声明的字符集和内容编码匹配。这是我在最初的回答中涵盖的唯一案例。
使用您建议的命令:
请注意,
CONVERT TO CHARACTER SET
命令仅出现在 MySQL 4.1.2 中,因此任何使用 2005 年之前安装的数据库的人都必须使用导出/导入技巧。这就是为什么互联网上有如此多的遗留脚本和文档以旧方式进行的原因。这些值已经使用 utf8 进行编码,
在这种情况下,您不希望 mysql 转换任何数据,您只需要更改列的元数据。
为此,您必须首先将每列的类型更改为 BLOB,然后更改为 TEXT utf8,这样就不会进行值转换:
这是推荐的方法,并且在 更改表语法文档。
在不同的编码中使用的值
在某些 Linux 发行版上,默认编码多年来一直是 Latin1。在这种情况下,您必须结合使用两种技术:
CONVERT TO
转换值。There are 3 different cases to consider:
The values are indeed encoded using Latin1
This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.
Use the command you suggested:
Note that the
CONVERT TO CHARACTER SET
command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.The values are already encoded using utf8
In this case, you don't want mysql to convert any data, you only need to change the column's metadata.
For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:
This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.
The values use in a different encoding
The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:
CONVERT TO
.直接转换可能会破坏任何包含非 utf7 字符的字符串。
如果您没有这些(即所有文本都是英文),通常会没问题。
但是,如果您有其中任何一个,则需要在初始运行中将所有 char/varchar/text 字段转换为 blob,并在后续运行中将它们转换为 utf8。
有关详细过程,请参阅本文:
http://codex.wordpress.org/Converting_Database_Character_Sets
A straightforward conversion will potentially break any strings with non-utf7 characters.
If you don't have any of those (i.e. all of your text is english), you'll usually be fine.
If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.
See this article for detailed procedures:
http://codex.wordpress.org/Converting_Database_Character_Sets
我过去在生产数据库上做过几次这样的操作(从旧的标准编码 swedish 转换为 latin1),当 MySQL 遇到无法转换为目标编码的字符时,它会中止转换并保持不变状态。因此,我认为 ALTER TABLE 语句有效。
I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.