可怕的 MySQL 导入编码问题 - 重新审视
我遇到了标准 MySQL 导入编码问题,但我似乎无法解决它。
我的客户的 WordPress 安装已经运行了一段时间。我已将数据库转储到文件中,并将其导入本地。结果页面上到处都是 � 字符。
我检查了双方的数据库属性: 生产:显示创建数据库wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
本地:显示创建数据库wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
生产:显示创建表wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL auto_increment,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7932 DEFAULT CHARSET=utf8
本地:显示创建表wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7918 DEFAULT CHARSET=utf8
我花了几个小时阅读有关如何压缩 � 的论坛,但我无法让任何东西发挥作用。 99%的答案都说要匹配数据库之间的字符集。我认为如果满足以下条件应该可行:
mysqldump --opt --compress --default-character-set=latin1 -uusername -ppassword wordpress | ssh [email protected] mysql --default-character-set=latin1 -uusername -ppassword wordpress
我也使用 utf8 字符集完成了它。仍然带有 � 。
我尝试直接修改 SQL 转储,将 utf8 或 latin1 放入“SET 名称 UTF8”行中。仍然带有 � 。
奇怪的症状
我希望这些 � 字符出现在内容中特殊字符的位置,例如 ñ 或 ö,但我已经在通常只有一个空间的地方看到了它。我还见过它代替撇号(但不是所有撇号)、双引号和商标符号。
� 标记非常罕见。它们平均每页出现三到四次。
通过 Sequel Pro(本地或实时)查看数据库时,我没有看到任何 �。通过 Textmate 查看时,我在 SQL 中没有看到任何 �。
我缺少什么?
编辑
更多信息:
我试图确定实时数据库认为的编码是什么。我运行了 show table status
,看起来排序规则是 utf8_general_ci、
utf8_bin和
latin1_swedish_ci` 的混合。它们有什么不同?有关系吗?
我还运行: show variables like "character_set_database"
并得到 latin1
;
I'm having the standard MySQL import encoding issue, but I can't seem to solve it.
My client has had a WordPress installation running for some time. I've dumped the database to a file, and imported it locally. The resulting pages have a splattering of � characters throughout.
I've inspected the database properties on both sides:
production: show create database wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
local: show create database wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
production: show create table wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL auto_increment,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7932 DEFAULT CHARSET=utf8
local: show create table wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7918 DEFAULT CHARSET=utf8
I've spent hours reading forums on how to squash the �, but I can't get anything to work. 99% of the answers say to match the character set between the databases. What I think should work if the following:
mysqldump --opt --compress --default-character-set=latin1 -uusername -ppassword wordpress | ssh [email protected] mysql --default-character-set=latin1 -uusername -ppassword wordpress
I've done it using the utf8 char-set as well. Still with the �'s.
I've tried modifying the SQL dump directly, putting with utf8 or latin1 in the "SET names UTF8" line. Still with the �'s.
Strange Symptoms
I'd expect these � characters to appear in place of special characters in the content, like ñ or ö, but I've seen it where there would normally be just a space. I've also seen it in place of apostrophes (but not all apostrophes), double quotes, and trademark symbols.
The � marks are pretty rare. They appear on average three to four times per page.
I don't see any �'s when viewing the database through Sequel Pro (locally or live). I don't see any �'s in the SQL when viewing through Textmate.
What am I missing?
EDIT
More info:
I've tried to determine what the live database thinks the encoding is. I ran show table status
, and it seems that the Collations are a mix of utf8_general_ci,
utf8_binand
latin1_swedish_ci`. What are they different? Does it matter?
I also ran: show variables like "character_set_database"
and got latin1
;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这就是我最终解决问题的方法:
首先
mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql
然后运行此脚本:
该脚本查找所有十六进制字符以上 127 并将它们编码到 HTML 实体中。
然后
mysql -uusername -ppassword 数据库
结果.sql
This is how I ended up solving my problem:
First
mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql
Then run this script:
The script finds all the hex characters above 127 and encoded them into their HTML entities.
Then
mysql -uusername -ppassword database < result.sql
较旧的 WordPress 数据库甚至较新的数据库的一个常见问题是数据库表设置为 latin-1,但内容实际上编码为 UTF-8。如果您尝试导出为 UTF-8,MySQL 将尝试将(假定的)Latin-1 数据转换为 UTF-8,从而产生双编码字符,因为数据已经是 UTF-8。
解决方案是将表导出为 latin-1。由于 MySQL 认为它们已经是 latin-1,因此它将直接导出。
将字符集从“latin1”更改为“utf8”。
由于转储的数据在导出过程中未进行转换,因此它实际上是 UTF-8 编码的数据。
将新表创建为 UTF-8 如果 CREATE TABLE 命令位于 SQL 转储文件中,请将字符集从“latin1”更改为“utf8”。
正常导入数据。由于转储文件中有 UTF-8 编码的数据,转储文件中声明的字符集现在是 UTF-8,并且您要导入的表也是 UTF-8,一切都会顺利进行
A common problem with older WordPress databases and even newer ones is that the database tables get set as latin-1 but the contents are actually encoded as UTF-8. If you try to export as UTF-8 MySQL will attempt to convert the (supposedly) Latin-1 data to UTF-8 resulting in double encoded characters since the data was already UTF-8.
The solution is to export the tables as latin-1. Since MySQL thinks they are already latin-1 it will do a straight export.
Change the character set from ‘latin1′ to ‘utf8′.
Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′.
Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly
我能够通过修改 wp-config.php 解决此问题,如下所示:
I was able to resolve this issue by modifying my wp-config.php as follows:
我认为你可以通过以下方式解决这个问题:
I think you can fix this issue by this way: