可怕的 MySQL 导入编码问题 - 重新审视

发布于 2024-11-06 21:11:28 字数 1965 浏览 0 评论 0原文

我遇到了标准 MySQL 导入编码问题,但我似乎无法解决它。

我的客户的 WordPress 安装已经运行了一段时间。我已将数据库转储到文件中,并将其导入本地。结果页面上到处都是 � 字符。

我检查了双方的数据库属性: 生产:显示创建数据库wordpress;

CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */

本地:显示创建数据库wordpress;

CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */

生产:显示创建表wp_posts;

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL auto_increment,
  ...
  KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7932 DEFAULT CHARSET=utf8

本地:显示创建表wp_posts;

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  ...
  KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7918 DEFAULT CHARSET=utf8

我花了几个小时阅读有关如何压缩 � 的论坛,但我无法让任何东西发挥作用。 99%的答案都说要匹配数据库之间的字符集。我认为如果满足以下条件应该可行:

mysqldump --opt --compress --default-character-set=latin1 -uusername -ppassword wordpress | ssh [email protected] mysql --default-character-set=latin1 -uusername -ppassword wordpress

我也使用 utf8 字符集完成了它。仍然带有 � 。

我尝试直接修改 SQL 转储,将 utf8latin1 放入“SET 名称 UTF8”行中。仍然带有 � 。

奇怪的症状

我希望这些 � 字符出现在内容中特殊字符的位置,例如 ñö,但我已经在通常只有一个空间的地方看到了它。我还见过它代替撇号(但不是所有撇号)、双引号和商标符号。

� 标记非常罕见。它们平均每页出现三到四次。

通过 Sequel Pro(本地或实时)查看数据库时,我没有看到任何 �。通过 Textmate 查看时,我在 SQL 中没有看到任何 �。

我缺少什么?

编辑

更多信息:

我试图确定实时数据库认为的编码是什么。我运行了 show table status,看起来排序规则是 utf8_general_ci、utf8_binlatin1_swedish_ci` 的混合。它们有什么不同?有关系吗?

我还运行: show variables like "character_set_database" 并得到 latin1;

I'm having the standard MySQL import encoding issue, but I can't seem to solve it.

My client has had a WordPress installation running for some time. I've dumped the database to a file, and imported it locally. The resulting pages have a splattering of � characters throughout.

I've inspected the database properties on both sides:
production: show create database wordpress;

CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */

local: show create database wordpress;

CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */

production: show create table wp_posts;

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL auto_increment,
  ...
  KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7932 DEFAULT CHARSET=utf8

local: show create table wp_posts;

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  ...
  KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7918 DEFAULT CHARSET=utf8

I've spent hours reading forums on how to squash the �, but I can't get anything to work. 99% of the answers say to match the character set between the databases. What I think should work if the following:

mysqldump --opt --compress --default-character-set=latin1 -uusername -ppassword wordpress | ssh [email protected] mysql --default-character-set=latin1 -uusername -ppassword wordpress

I've done it using the utf8 char-set as well. Still with the �'s.

I've tried modifying the SQL dump directly, putting with utf8 or latin1 in the "SET names UTF8" line. Still with the �'s.

Strange Symptoms

I'd expect these � characters to appear in place of special characters in the content, like ñ or ö, but I've seen it where there would normally be just a space. I've also seen it in place of apostrophes (but not all apostrophes), double quotes, and trademark symbols.

The � marks are pretty rare. They appear on average three to four times per page.

I don't see any �'s when viewing the database through Sequel Pro (locally or live). I don't see any �'s in the SQL when viewing through Textmate.

What am I missing?

EDIT

More info:

I've tried to determine what the live database thinks the encoding is. I ran show table status, and it seems that the Collations are a mix of utf8_general_ci,utf8_binandlatin1_swedish_ci`. What are they different? Does it matter?

I also ran: show variables like "character_set_database" and got latin1;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

微凉徒眸意 2024-11-13 21:11:28

这就是我最终解决问题的方法:

首先 mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql

然后运行此脚本:

$search = array('/latin1/');
$replace = array('utf8');
foreach (range(128, 255) as $dec) {
    $search[] = "/\x".dechex($dec)."/";
    $replace[] = "&#$dec;";
}

$input = fopen('dump.sql', 'r');
$output = fopen('result.sql', 'w');

while (!feof($input)) {
    $line = fgets($input);
    $line = preg_replace($search, $replace, $line);
    fwrite($output, $line);
}

fclose($input);
fclose($output);

该脚本查​​找所有十六进制字符以上 127 并将它们编码到 HTML 实体中。

然后mysql -uusername -ppassword 数据库 结果.sql

This is how I ended up solving my problem:

First mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql

Then run this script:

$search = array('/latin1/');
$replace = array('utf8');
foreach (range(128, 255) as $dec) {
    $search[] = "/\x".dechex($dec)."/";
    $replace[] = "&#$dec;";
}

$input = fopen('dump.sql', 'r');
$output = fopen('result.sql', 'w');

while (!feof($input)) {
    $line = fgets($input);
    $line = preg_replace($search, $replace, $line);
    fwrite($output, $line);
}

fclose($input);
fclose($output);

The script finds all the hex characters above 127 and encoded them into their HTML entities.

Then mysql -uusername -ppassword database < result.sql

凤舞天涯 2024-11-13 21:11:28

较旧的 WordPress 数据库甚至较新的数据库的一个常见问题是数据库表设置为 latin-1,但内容实际上编码为 UTF-8。如果您尝试导出为 UTF-8,MySQL 将尝试将(假定的)Latin-1 数据转换为 UTF-8,从而产生双编码字符,因为数据已经是 UTF-8。

解决方案是将表导出为 latin-1。由于 MySQL 认为它们已经是 latin-1,因此它将直接导出。

将字符集从“latin1”更改为“utf8”。
由于转储的数据在导出过程中未进行转换,因此它实际上是 UTF-8 编码的数据。

将新表创建为 UTF-8 如果 CREATE TABLE 命令位于 SQL 转储文件中,请将字符集从“latin1”更改为“utf8”。

正常导入数据。由于转储文件中有 UTF-8 编码的数据,转储文件中声明的字符集现在是 UTF-8,并且您要导入的表也是 UTF-8,一切都会顺利进行

A common problem with older WordPress databases and even newer ones is that the database tables get set as latin-1 but the contents are actually encoded as UTF-8. If you try to export as UTF-8 MySQL will attempt to convert the (supposedly) Latin-1 data to UTF-8 resulting in double encoded characters since the data was already UTF-8.

The solution is to export the tables as latin-1. Since MySQL thinks they are already latin-1 it will do a straight export.

Change the character set from ‘latin1′ to ‘utf8′.
Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.

Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′.

Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly

乖乖兔^ω^ 2024-11-13 21:11:28

我能够通过修改 wp-config.php 解决此问题,如下所示:

/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');

/** The Database Collate type. Don't change this if in doubt. */
define( 'DB_COLLATE', 'utf8_general_ci' );

I was able to resolve this issue by modifying my wp-config.php as follows:

/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');

/** The Database Collate type. Don't change this if in doubt. */
define( 'DB_COLLATE', 'utf8_general_ci' );
酷遇一生 2024-11-13 21:11:28

我认为你可以通过以下方式解决这个问题:

$link = mysql_connect('localhost', 'mysql_user', 'mysql_password');
$db = mysql_select_db('mysql_db', $link);
mysql_query('set names utf8', $link);

I think you can fix this issue by this way:

$link = mysql_connect('localhost', 'mysql_user', 'mysql_password');
$db = mysql_select_db('mysql_db', $link);
mysql_query('set names utf8', $link);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文