如果更改 MySQL 中列的排序规则,现有数据会发生什么情况?
我正在使用 MySQL 数据库服务器运行生产应用程序。我忘记将列的排序规则从 latin
设置为 utf8_unicode
,这会导致在保存到包含多语言数据的列时出现奇怪的数据。
我的问题是,如果我现在将排序规则更改为 utf8_unicode,现有数据会发生什么情况?它会破坏或损坏现有数据还是会保留数据,但新数据将按其应有的方式保存为 utf8
?
我将使用 phpMyAdmin Web 客户端进行更改。
I am running a production application with MySQL database server. I forget to set column's collation from latin
to utf8_unicode
, which results in strange data when saving to the column with multi-language data.
My question is, what will happen with my existing data if I change my collation to utf8_unicode now? Will it destroy or corrupt the existing data or will the data remain, but the new data will be saved as utf8
as it should?
I will change with phpMyAdmin web client.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
文章 http://mysqldump.azundris.com/archives/60-Handling -character-sets.html 详细讨论了这一点,并展示了将会发生的情况。
请注意,您将字符集(实际上是编码)与排序规则混淆了。
字符集定义了字符串在磁盘上以字节为单位的物理表示。您可以使用 HEX() 函数使其可见,例如 SELECT HEX(str) FROM t WHERE id = 1 来查看 MySQL 如何存储字符串的字节。 MySQL 向您提供的内容可能会有所不同,具体取决于使用
SET NAMES ....
定义的连接字符集。排序规则是一种排序顺序。它取决于字符集。例如,您的数据可能采用 latin1 字符集,但可能根据两个德语排序顺序 latin1_german1_ci 或 latin1_german2_ci 之一进行排序。根据您的选择,元音变音(例如 ö)将排序为 oe 或 o。
当您更改字符集时,需要重写表中的数据。 MySQL将读取表中的所有数据和所有索引,对临时占用磁盘空间的表进行隐藏副本,然后将旧表移动到隐藏位置,将隐藏表移动到位,然后删除旧数据,释放磁盘空间。在这之间的一段时间内,您将需要两倍的存储空间。
当您更改排序规则时,数据的排序顺序会更改,但数据本身不会更改。如果您要更改的列不是索引的一部分,则除了重写 frm 文件之外不需要执行任何操作,并且足够新的 MySQL 版本不应执行更多操作。
当您更改属于索引的列的排序规则时,需要重写索引,因为索引是表的排序摘录。这将再次触发上面概述的 ALTER TABLE 表复制逻辑。
MySQL 尝试这样做来保留数据:只要您拥有的数据可以用目标字符集表示,转换就不会丢失。如果发生数据截断,将会打印警告,并且无法用目标字符集表示的数据将被替换为 ?
The article http://mysqldump.azundris.com/archives/60-Handling-character-sets.html discusses this at length and also shows what will happen.
Please note that you are mixing up a CHARACTER SET (actually an encoding) with a COLLATION.
A character set defines the physical representation of a string in bytes on disk. You can make this visible, using the HEX() function, for example
SELECT HEX(str) FROM t WHERE id = 1
to see how MySQL stores the bytes of your string. What MySQL delivers to you may be different, depending on the character set of your connection, defined withSET NAMES ....
.A collation is a sort order. It is dependent on the character set. For example, your data may be in the latin1 character set, but it may be ordered according to either of the two german sort orders latin1_german1_ci or latin1_german2_ci. Depending on your choice, Umlauts such as ö will either sort as oe or as o.
When you are changing a character set, the data in your table needs to be rewritten. MySQL will read all data and all indexes in the table, make a hidden copy of the table which temporarily takes up disk space, then moves the old table into a hidden location, moves the hidden table into place and then drops the old data, freeing up disk space. For some time inbetween, you will need two times the storage for that.
When you are changing a collation, the sort order of the data changes but not the data itself. If the column you are changing is not part of an index, nothing needs to be done besides rewriting the frm file, and sufficiently recent versions of MySQL should not do more.
When you are changing a collation of a column that is part of an index, the index needs to be rewritten, as an index is a sorted excerpt of a table. This will again trigger the ALTER TABLE table copy logic outlined above.
MySQL tries to preserve data doing this: As long as the data you have can be represented in the target character set, the conversion will not be lossy. Warnings will be printed if there is data truncation going on, and data which cannot be represented in the target character set will be replaced by ?
在 MySQL 5.1 中运行快速测试,并将 VARCHAR 列设置为latin1_bin,我插入了一些非拉丁字符,
我选择了它们并得到了垃圾(如预期)。
然后
我将列的排序规则更改为 utf8_unicode 并重新运行 SELECT 并显示相同的结果
这就是我所期望的 - 它将保留数据并且数据将仍然是垃圾,因为当插入数据时,该列丢失了额外的字符信息,只插入了一个?对于每个非拉丁字符,没有办法 ????再次成为英国华侨。
您的数据将保留在原处,但不会被修复。
Running a quick test in MySQL 5.1 with a VARCHAR column set to
latin1_bin
I inserted some non-latin charsI select them and get rubbish (as expected).
gives
I then changed the collation of the column to
utf8_unicode
and re-ran the SELECT and it shows the same resultThis is what I would expect - It will keep the data and the data will remain rubbish, because when the data was inserted the column lost the extra character information and just inserted a ? for each non-latin character and there is no way for the ???? to again become 英國華僑.
Your data will stay in place but it won't be fixed.
有效数据将被正确转换:
http://dev.mysql.com/doc/refman/5.5 /en/alter-table.html
...更具体地说:
http://dev.mysql.com/doc/refman/5.1 /en/charset-conversion.html
所以你的问题是无效数据,例如,以不同字符集编码的数据。我已经尝试了文档建议的提示,它基本上毁了我的数据,但原因是我的数据已经丢失:运行
SELECT column, HEX(column) FROM table
显示多字节字符已丢失被插入为0x3F
(即 Latin1 中的?
符号)。我的 MySQL 堆栈足够聪明,可以检测到输入数据不是 Latin1 并将其转换为“兼容”的数据。一旦数据丢失,您就无法恢复。总结一下:
Valid data will be properly converted:
http://dev.mysql.com/doc/refman/5.5/en/alter-table.html
... and more specifically:
http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html
So your problem is invalid data, e.g., data encoded in a different character set. I've tried the tip suggested by the documentation and it basically ruined my data, but the reason is that my data was already lost: running
SELECT column, HEX(column) FROM table
showed that multibyte chars had been inserted as0x3F
(i.e., the?
symbol in Latin1). My MySQL stack had been smart enough to detect that input data was not Latin1 and convert it into something "compatible". And once data is gone, you can't get it back.To sum up:
答案:如果您更改为 utf8_unicode_ci,您现有的数据不会发生任何变化(这些数据已经损坏,并且在您修改之前保持损坏状态)。
答:改成utf8_unicode_ci后,现有数据不会被破坏。它将保持像以前一样(类似???)。但是,如果插入包含 Unicode 字符的新数据,它将被正确存储。
答案:当然,您可以通过转到“Operations”>“phpMyAdmin”来更改排序规则。表选项
Answer: If you change to utf8_unicode_ci, nonthing will happen to your existing data (which is already corrupt and remain corrupt till you modify it).
Answer: After you change to utf8_unicode_ci, existing data will not be destroyed. It will remain the same like before (something like ????). However, if you insert new data containing Unicode characters, it will be stored correctly.
Answer: Sure, you can change collation with phpMyAdmin by going to Operations > Table options
警告! 一些问题可以通过一些解决
通过两步过程解决
如果你做错了,你会遇到更糟糕的混乱!
SELECT HEX(col), col ...
来查看您真正拥有的内容。CAUTION! Some problems are solved via
Some are solved via a 2-step process
If you do the wrong one, you will have a worse mess!
SELECT HEX(col), col ...
to see what you really have.