MySQL 将 latin1 数据转换为 UTF8

发布于 2024-08-05 06:05:30 字数 732 浏览 6 评论 0原文

我使用 LOAD DATA INFILE 将一些数据导入到 MySQL 数据库中。表本身和列都使用UTF8字符集,但是数据库的默认字符集是latin 1。因为数据库的默认字符类型是latin1,并且我使用LOAD DATA INFILE没有指定字符集,所以它解释为文件为 latin1,即使文件中的数据为 UTF8。现在我的 UTF8 列中有一堆编码错误的数据。我发现 这篇文章 看起来解决类似的问题,即“UTF8插入cp1251”,但我的问题是“Latin1插入UTF8”。我尝试编辑查询以将 latin1 数据转换为 UTF8,但无法使其工作。要么得到的数据与以前相同,要么甚至比以前更加混乱。举例来说,Québec(魁北克)一词显示为 Québec(魁北克)。

[附加信息]

当选择 HEX() 中包含的数据时,魁北克的值为 5175C383C2A9626563。

这个表的Create Table(简称)是。

CREATE TABLE MyDBName.`MyTableName`
(
`ID` INT NOT NULL AUTO_INCREMENT, 
.......
`City` CHAR(32) NULL, 
.......
`)) ENGINE InnoDB CHARACTER SET utf8;

I imported some data using LOAD DATA INFILE into a MySQL Database. The table itself and the columns are using the UTF8 character set, but the default character set of the database is latin 1. Because the default character type of the database is latin1, and I used LOAD DATA INFILE without specifying a character set, it interpreted the file as latin1, even though the data in the file was UTF8. Now I have a bunch of badly encoded data in my UTF8 colum. I found this article which seems to address a similar problem, which is "UTF8 inserted in cp1251", but my problem is "Latin1 inserted in UTF8". I've tried editing the queries there to convert the latin1 data to UTF8, but can't get it to work. Either the data comes out the same, or even more mangled than before. Just as an example, the word Québec is showing as Québec.

[ADDITIONAL INFO]

When Selecting the data wrapped in HEX(), Québec has the value 5175C383C2A9626563.

The Create Table (shortened) of this table is.

CREATE TABLE MyDBName.`MyTableName`
(
`ID` INT NOT NULL AUTO_INCREMENT, 
.......
`City` CHAR(32) NULL, 
.......
`)) ENGINE InnoDB CHARACTER SET utf8;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

古镇旧梦 2024-08-12 06:05:30

我在旧的 WordPress 安装中遇到过这样的情况,问题是数据本身已经是 Latin1 数据库中的 UTF-8 格式(由于 WP 默认字符集)。这意味着除了 ddbb 和表格式之外,并不真正需要转换数据。
根据我的经验,在进行转储时事情会变得混乱,因为我知道 MySQL 将使用客户端的默认字符集,在许多情况下现在是 UTF-8。
因此,确保使用相同的数据编码导出非常重要。对于采用 UTF-8 编码的 Latin1 DDBB:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

然后在以 UTF-8 格式重新导入到新数据库之前替换导出的转储中的 Latin1 引用。有点:

$ replace "CHARSET=latin1" "CHARSET=utf8" \
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

就我而言,此链接很有帮助。
此处用西班牙语评论

I've had cases like this in old wordpress installations with the problem being that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset). This means there was no real need for conversion of the data but the ddbb and table formats.
In my experience things get messed up when doing the dump as I understand MySQL will use the client's default character set which in many cases is now UTF-8.
Therefore making sure that exporting with the same coding of the data is very important. In case of Latin1 DDBB with UTF-8 coding:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

Then replace the Latin1 references within the exported dump before reimporting to a new database in UTF-8. Sort of:

$ replace "CHARSET=latin1" "CHARSET=utf8" \
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

In my case this link was of great help.
Commented here in spanish.

深巷少女 2024-08-12 06:05:30

虽然对于OP来说它几乎仍然不实际,但我碰巧在MySQL文档中找到了一个解决方案 更改表。我将其发布在这里仅供将来参考:

警告

CONVERT TO 操作在字符集之间转换列值。如果您有一列采用一种字符集(如 latin1),但存储的值实际上使用其他一些不兼容的字符集(如 utf8),那么这不是您想要的。在这种情况下,您必须对每个此类列执行以下操作:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

此方法有效的原因是,当您与 BLOB 列进行转换或从 BLOB 列进行转换时,不会发生任何转换。

Though it is hardly still actual for the OP, I happen to have found a solution in MySQL documentation for ALTER TABLE. I post it here just for future reference:

Warning

The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

The reason this works is that there is no conversion when you convert to or from BLOB columns.

乖不如嘢 2024-08-12 06:05:30

LOAD DATA INFILE 允许您设置编码文件应该位于:

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

LOAD DATA INFILE allows you to set an encoding file is supposed to be in:

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

玩物 2024-08-12 06:05:30

我为拉丁语写了 http://code.google.com/p/mysqlutf8convertor/数据库到 UTF-8 数据库。所有表和字段改为UTF-8。

I wrote that http://code.google.com/p/mysqlutf8convertor/ for Latin Database to UTF-8 Database. All tables and field to change UTF-8.

失退 2024-08-12 06:05:30

将 latin1 转换为 UTF8 不是您想要做的,您需要相反的操作。

如果真正发生的情况是这样的:

  1. UTF-8 字符串被解释为 Latin-1 并转码为 UTF-8,从而破坏它们。
  2. 您现在正在或可能正在读取 UTF-8 字符串,无需进一步解释

您现在必须做的是:

  1. 读取“UTF-8”,无需转码。
  2. 将其转换为 Latin-1。现在您实际上应该拥有原始的 UTF-8。
  3. 现在将其放入“UTF-8”列中,无需进一步转换。

Converting latin1 to UTF8 is not what you want to do, you kind of need the opposite.

If what really happened was this:

  1. UTF-8 strings were interpreted as Latin-1 and transcoded to UTF-8, mangling them.
  2. You are now, or could be, reading UTF-8 strings with no further interpretation

What you must do now is:

  1. Read the "UTF-8" with no transcode.
  2. Convert it to Latin-1. Now you should actually have the original UTF-8.
  3. Now put it in your "UTF-8" column with no further conversion.
乖乖 2024-08-12 06:05:30

我最近完成了一个 shell 脚本,可以自动执行转换过程。还可以配置为您想要替换或删除的任何文本编写自定义过滤器。例如:剥离 HTML 字符等。表白名单和黑名单也是可能的。你可以在sourceforge下载:https://sourceforge.net/projects/mysqltr/

I recently completed a shell script that automates the conversion process. It is also configurable to write custom filters for any text you wish to replace or remove. For example : stripping HTML characters etc. Table whitelists and blacklists are also possible. You can download it at sourceforge: https://sourceforge.net/projects/mysqltr/

怎言笑 2024-08-12 06:05:30

试试这个:

1)转储你的数据库

mysqldump --default-character-set=latin1 -u username -p databasename < dump.sql

2)在文本编辑器中打开 dump.sql 并用“SET NAMES utf8”替换所有出现的“SET NAMES latin1”

3)创建一个新数据库并恢复你的转储文件

cat dump.sql | mysql -u root -p newdbname

Try this:

1) Dump your DB

mysqldump --default-character-set=latin1 -u username -p databasename < dump.sql

2) Open dump.sql in text editor and replace all occurences of "SET NAMES latin1" by "SET NAMES utf8"

3) Create a new database and restore your dumpfile

cat dump.sql | mysql -u root -p newdbname
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文