如果更改 MySQL 中列的排序规则,现有数据会发生什么情况?

发布于 2024-10-31 01:38:35 字数 266 浏览 1 评论 0原文

我正在使用 MySQL 数据库服务器运行生产应用程序。我忘记将列的排序规则从 latin 设置为 utf8_unicode,这会导致在保存到包含多语言数据的列时出现奇怪的数据。

我的问题是,如果我现在将排序规则更改为 utf8_unicode,现有数据会发生什么情况?它会破坏或损坏现有数据还是会保留数据,但新数据将按其应有的方式保存为 utf8

我将使用 phpMyAdmin Web 客户端进行更改。

I am running a production application with MySQL database server. I forget to set column's collation from latin to utf8_unicode, which results in strange data when saving to the column with multi-language data.

My question is, what will happen with my existing data if I change my collation to utf8_unicode now? Will it destroy or corrupt the existing data or will the data remain, but the new data will be saved as utf8 as it should?

I will change with phpMyAdmin web client.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

埋情葬爱 2024-11-07 01:38:35

文章 http://mysqldump.azundris.com/archives/60-Handling -character-sets.html 详细讨论了这一点,并展示了将会发生的情况。

请注意,您将字符集(实际上是编码)与排序规则混淆了。

字符集定义了字符串在磁盘上以字节为单位的物理表示。您可以使用 HEX() 函数使其可见,例如 SELECT HEX(str) FROM t WHERE id = 1 来查看 MySQL 如何存储字符串的字节。 MySQL 向您提供的内容可能会有所不同,具体取决于使用 SET NAMES .... 定义的连接字符集。

排序规则是一种排序顺序。它取决于字符集。例如,您的数据可能采用 latin1 字符集,但可能根据两个德语排序顺序 latin1_german1_ci 或 latin1_german2_ci 之一进行排序。根据您的选择,元音变音(例如 ö)将排序为 oe 或 o。

当您更改字符集时,需要重写表中的数据。 MySQL将读取表中的所有数据和所有索引,对临时占用磁盘空间的表进行隐藏副本,然后将旧表移动到隐藏位置,将隐藏表移动到位,然后删除旧数据,释放磁盘空间。在这之间的一段时间内,您将需要两倍的存储空间。

当您更改排序规则时,数据的排序顺序会更改,但数据本身不会更改。如果您要更改的列不是索引的一部分,则除了重写 frm 文件之外不需要执行任何操作,并且足够新的 MySQL 版本不应执行更多操作。

当您更改属于索引的列的排序规则时,需要重写索引,因为索引是表的排序摘录。这将再次触发上面概述的 ALTER TABLE 表复制逻辑。

MySQL 尝试这样做来保留数据:只要您拥有的数据可以用目标字符集表示,转换就不会丢失。如果发生数据截断,将会打印警告,并且无法用目标字符集表示的数据将被替换为 ?

The article http://mysqldump.azundris.com/archives/60-Handling-character-sets.html discusses this at length and also shows what will happen.

Please note that you are mixing up a CHARACTER SET (actually an encoding) with a COLLATION.

A character set defines the physical representation of a string in bytes on disk. You can make this visible, using the HEX() function, for example SELECT HEX(str) FROM t WHERE id = 1 to see how MySQL stores the bytes of your string. What MySQL delivers to you may be different, depending on the character set of your connection, defined with SET NAMES .....

A collation is a sort order. It is dependent on the character set. For example, your data may be in the latin1 character set, but it may be ordered according to either of the two german sort orders latin1_german1_ci or latin1_german2_ci. Depending on your choice, Umlauts such as ö will either sort as oe or as o.

When you are changing a character set, the data in your table needs to be rewritten. MySQL will read all data and all indexes in the table, make a hidden copy of the table which temporarily takes up disk space, then moves the old table into a hidden location, moves the hidden table into place and then drops the old data, freeing up disk space. For some time inbetween, you will need two times the storage for that.

When you are changing a collation, the sort order of the data changes but not the data itself. If the column you are changing is not part of an index, nothing needs to be done besides rewriting the frm file, and sufficiently recent versions of MySQL should not do more.

When you are changing a collation of a column that is part of an index, the index needs to be rewritten, as an index is a sorted excerpt of a table. This will again trigger the ALTER TABLE table copy logic outlined above.

MySQL tries to preserve data doing this: As long as the data you have can be represented in the target character set, the conversion will not be lossy. Warnings will be printed if there is data truncation going on, and data which cannot be represented in the target character set will be replaced by ?

我ぃ本無心為│何有愛 2024-11-07 01:38:35

在 MySQL 5.1 中运行快速测试,并将 VARCHAR 列设置为latin1_bin,我插入了一些非拉丁字符,

INSERT INTO Test VALUES ('英國華僑');

我选择了它们并得到了垃圾(如预期)。

SELECT text from Test;

然后

text
????

我将列的排序规则更改为 utf8_unicode 并重新运行 SELECT 并显示相同的结果

text
????

这就是我所期望的 - 它将保留数据并且数据将仍然是垃圾,因为当插入数据时,该列丢失了额外的字符信息,只插入了一个?对于每个非拉丁字符,没有办法 ????再次成为英国华侨。

您的数据将保留在原处,但不会被修复。

Running a quick test in MySQL 5.1 with a VARCHAR column set to latin1_bin I inserted some non-latin chars

INSERT INTO Test VALUES ('英國華僑');

I select them and get rubbish (as expected).

SELECT text from Test;

gives

text
????

I then changed the collation of the column to utf8_unicode and re-ran the SELECT and it shows the same result

text
????

This is what I would expect - It will keep the data and the data will remain rubbish, because when the data was inserted the column lost the extra character information and just inserted a ? for each non-latin character and there is no way for the ???? to again become 英國華僑.

Your data will stay in place but it won't be fixed.

迷爱 2024-11-07 01:38:35

有效数据将被正确转换:

当您使用以下命令更改数据类型时
更改或修改,MySQL 尝试
将现有列值转换为
尽可能新的类型。警告:
这种转换可能会导致
数据更改。

http://dev.mysql.com/doc/refman/5.5 /en/alter-table.html

...更具体地说:

转换二进制或非二进制
使用特定的字符串列
字符集,使用 ALTER TABLE。为了
成功转换发生,其中之一
必须具备以下条件
apply:[...] 如果该列有
非二进制数据类型(CHAR、VARCHAR、
TEXT),其内容应该被编码
在列字符集中,不是一些
其他字符集。如果内容
以不同的字符编码
设置后,您可以转换列以使用
首先是二进制数据类型,然后是
具有所需的非二元列
字符集。

http://dev.mysql.com/doc/refman/5.1 /en/charset-conversion.html

所以你的问题是无效数据,例如,以不同字符集编码的数据。我已经尝试了文档建议的提示,它基本上毁了我的数据,但原因是我的数据已经丢失:运行 SELECT column, HEX(column) FROM table 显示多字节字符已丢失被插入为 0x3F(即 Latin1 中的 ? 符号)。我的 MySQL 堆栈足够聪明,可以检测到输入数据不是 Latin1 并将其转换为“兼容”的数据。一旦数据丢失,您就无法恢复。

总结一下:

  1. 使用 HEX() 来查明您是否仍然拥有数据。
  2. 在表格的副本中进行测试。

Valid data will be properly converted:

When you change a data type using
CHANGE or MODIFY, MySQL tries to
convert existing column values to the
new type as well as possible. Warning:
This conversion may result in
alteration of data.

http://dev.mysql.com/doc/refman/5.5/en/alter-table.html

... and more specifically:

To convert a binary or nonbinary
string column to use a particular
character set, use ALTER TABLE. For
successful conversion to occur, one of
the following conditions must
apply:[...] If the column has a
nonbinary data type (CHAR, VARCHAR,
TEXT), its contents should be encoded
in the column character set, not some
other character set. If the contents
are encoded in a different character
set, you can convert the column to use
a binary data type first, and then to
a nonbinary column with the desired
character set.

http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html

So your problem is invalid data, e.g., data encoded in a different character set. I've tried the tip suggested by the documentation and it basically ruined my data, but the reason is that my data was already lost: running SELECT column, HEX(column) FROM table showed that multibyte chars had been inserted as 0x3F (i.e., the ? symbol in Latin1). My MySQL stack had been smart enough to detect that input data was not Latin1 and convert it into something "compatible". And once data is gone, you can't get it back.

To sum up:

  1. Use HEX() to find out if you still have your data.
  2. Make your tests in a copy of your table.
请你别敷衍 2024-11-07 01:38:35

我的问题是,如果我更改我的现有数据,会发生什么情况
现在排序规则为 utf8_unicode 吗?

答案:如果您更改为 utf8_unicode_ci,您现有的数据不会发生任何变化(这些数据已经损坏,并且在您修改之前保持损坏状态)。

它会破坏或损坏现有数据还是会保留数据,
但新数据会按原样保存为 utf8 吗?

答:改成utf8_unicode_ci后,现有数据不会被破坏。它将保持像以前一样(类似???)。但是,如果插入包含 Unicode 字符的新数据,它将被正确存储。

我将使用 phpMyAdmin Web 客户端进行更改。

答案:当然,您可以通过转到“Operations”>“phpMyAdmin”来更改排序规则。表选项

My question is, what will happen with my existing data if I change my
collation to utf8_unicode now?

Answer: If you change to utf8_unicode_ci, nonthing will happen to your existing data (which is already corrupt and remain corrupt till you modify it).

Will it destroy or corrupt the existing data or will the data remain,
but the new data will be saved as utf8 as it should?

Answer: After you change to utf8_unicode_ci, existing data will not be destroyed. It will remain the same like before (something like ????). However, if you insert new data containing Unicode characters, it will be stored correctly.

I will change with phpMyAdmin web client.

Answer: Sure, you can change collation with phpMyAdmin by going to Operations > Table options

白衬杉格子梦 2024-11-07 01:38:35

警告! 一些问题可以通过一些解决

ALTER TABLE ... CONVERT TO ...

通过两步过程解决

ALTER TABLE ... MODIFY ... VARBINARY...
ALTER TABLE ... MODIFY ... VARCHAR...

如果你做错了,你会遇到更糟糕的混乱

  1. 执行SELECT HEX(col), col ...来查看您真正拥有的内容。
  2. 研究一下你的情况: utf8 字符有问题;我看到的不是我存储的
  3. 根据这些情况执行正确的修复:http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

CAUTION! Some problems are solved via

ALTER TABLE ... CONVERT TO ...

Some are solved via a 2-step process

ALTER TABLE ... MODIFY ... VARBINARY...
ALTER TABLE ... MODIFY ... VARCHAR...

If you do the wrong one, you will have a worse mess!

  1. Do SELECT HEX(col), col ... to see what you really have.
  2. Study this to see what case you have: Trouble with utf8 characters; what I see is not what I stored
  3. Perform the correct fix, based on these cases: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文