MySQL 数据似乎不采用应有的字符编码
首先,感谢阅读本文的人。 我在使用 PHP 的 PDO 进行交互的 MySQL 数据库中的字符编码有一个非常奇怪的问题。表全部使用UTF8编码,webapp使用utf-8,但似乎数据库中存储的数据实际上不是utf-8而是latin-1。
很长一段时间以来,一切都运行良好,但是在导入 utf-8 编码的数据文件或执行包含特殊字符(例如“é”或“ë”)的全文搜索时,这会导致问题。
编辑:
一些回复表明这是我的终端的问题。它不是:
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'latin1'");
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'utf8'");
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
输出以下内容:
é
%C3%A9f
é
%C3%A9f
é
%C3%83%C2%A9f
到目前为止感谢大家。
END EDIT
因此,首先我检查表是否正常工作:
USE information_schema;
mysql> SELECT table_collation FROM tables WHERE table_schema="mydb" and table_name="mytable";
+-----------------+
| table_collation |
+-----------------+
| utf8_general_ci |
+-----------------+
1 row in set (0.00 sec)
mysql> SELECT character_set_name,collation_name FROM information_schema.columns WHERE table_schema="mydb" and table_name="t" and column_name="c";
+--------------------+-----------------+
| character_set_name | collation_name |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
但是,数据似乎不是 utf-8,而是 latin-1:
mysql> use mydb;
Database changed
mysql> SET NAMES 'latin1';
Query OK, 0 rows affected (0.00 sec)
mysql> select c from t;
+---+
| c |
+---+
| é |
+---+
1 row in set (0.00 sec)
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> select c from t;
+----+
| c |
+----+
| é |
+----+
1 row in set (0.00 sec)
所以我有两个问题:
1)大多数重要的是,我可以对数据库中已有的数据做什么?
2) 有没有办法设置数据库,以便在连接时它实际上使用utf-8,或者每次都必须执行SET NAMES查询?
非常感谢您的时间和帮助,
马特
First off, thanks to whomever is reading this.
I have a very strange problem with character encoding in a MySQL database that I am using PHP's PDO to interface with. The tables are all encoded using UTF8, the webapp uses utf-8, but it seems that the data stored in the database is not actually utf-8 but latin-1.
Things had been working fine for quite some time, but this is causing issues with when importing utf-8 encoded data files or conducting fulltext searches that contain special characters such as "é" or "ë".
EDIT:
some replies have suggested that this is an issue with my terminal. It is not:
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'latin1'");
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'utf8'");
foreach($dbh->query("SELECT c FROM t") as $row){
echo $row['c'] ."\n";
echo urlencode($row['c'])."\n";
}
Outputs the following:
é
%C3%A9f
é
%C3%A9f
é
%C3%83%C2%A9f
Thanks to everyone so far.
END EDIT
So, first I check that the tables are working as they should:
USE information_schema;
mysql> SELECT table_collation FROM tables WHERE table_schema="mydb" and table_name="mytable";
+-----------------+
| table_collation |
+-----------------+
| utf8_general_ci |
+-----------------+
1 row in set (0.00 sec)
mysql> SELECT character_set_name,collation_name FROM information_schema.columns WHERE table_schema="mydb" and table_name="t" and column_name="c";
+--------------------+-----------------+
| character_set_name | collation_name |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
However, the data does not appear to be utf-8, but latin-1:
mysql> use mydb;
Database changed
mysql> SET NAMES 'latin1';
Query OK, 0 rows affected (0.00 sec)
mysql> select c from t;
+---+
| c |
+---+
| é |
+---+
1 row in set (0.00 sec)
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> select c from t;
+----+
| c |
+----+
| é |
+----+
1 row in set (0.00 sec)
So I have two questions:
1) Most importantly, what can I do about the data already in the DB?
2) Is there a way to set up the DB so that it actually uses utf-8 when I connect or do I have to execute the SET NAMES query every time?
Many thanks for your time and help,
Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是你的终端似乎是 latin1,而不是数据:)
你已经设置好了。
您唯一需要做的就是设置 client 编码,这是通过 SET NAMES 'utf8' 完成的。
事实上,通过使用 SET NAMES,您可以使数据以您设置的任何编码显示。这是 SET NAMES 魔法词的唯一目的。
如果您遇到本问题未涵盖的一些编码问题,
无论您希望什么,只要您的数据库不返回
?
标记即可。要恢复数据,您必须将名称设置为表的数据编码集。这将阻止 mysql 重新编码数据。因此,您可以获取或转储它,然后使用正确的设置再次加载它。
编辑
经过一番考虑,我想说你的数据采用utf8格式,而表编码以某种方式设置为latin1。
%C3%A9 是 é 字符的完全有效的 utf-8 表示形式。 (不知道你在哪里得到尾随
f
)而 %C3%83%C2%A9 是 %C3%A9 的 utf-f 编码版本。因此,您的数据库似乎认为您的数据采用 latin1 格式,并将其编码为 utf8。
因此,当您将名称设置为 latin1 时,它不会介意并且不会重新编码。
结论:
仔细检查您的表(和字段)编码。它应该是 latin1
是的,要保存您的数据,您必须执行类似的操作
***上帝,我讨厌这种自动格式化问题使我无法在列表项之后立即发布代码
,然后检查此转储并将 latin1 的每个外观更改为 utf8。
然后加载回来。
不要忘记先备份您的数据!
it's your terminal appeared to be in latin1, not data :)
you have set it up already.
only thing you need is to set up client encoding, which is done by SET NAMES 'utf8'
in fact, by using SET NAMES, you can make your data appear in whatever encoding you set. that's the only purpose of the SET NAMES magic word.
in case you have some encoding issues not covered in this question,
whatever you wish, as long as your db returns no
?
marks.to recover your data you have to set names to the data encoding set for the table. this will prevent mysql from recoding the data. so, you can fetch or dump it and then load it again with proper settings.
EDIT
after some consideration, i am going to say that your data is in utf8 while table encoding is set to latin1 somehow.
%C3%A9 is a perfectly valid utf-8 representation of the é character. (dunno where did you get trailing
f
though)while %C3%83%C2%A9 is utf-f encoded version of %C3%A9. So, it seems your database thinks that your data is in latin1 and encode it in utf8.
so, when you set names to latin1 it don't mind and don't recode.
Conclusion:
doublecheck your table (and fields) encoding. it should be latin1
yes, to save your data you have to do something like
***God I HATE this autoformatting issue which prevents me from posting the code right after list item
then check this dump and change every appearance of latin1 to utf8.
then load it back.
do not forget to backup your data first!
如果您从 HTML 页面收到数据并将其保存到数据库中,请不要忘记在 HTML 的
head
部分中设置正确的文本编码:如果您不这样做如果设置文本编码,浏览器可能会返回不同编码的文本。
关于1):看看
这里描述的PHP函数。从数据库中检索记录,转码为所需的编码并将其写回。
In case you received data from a HTML page and saved this to your DB, don't forget to set the correct text encoding in the HTML's
head
section:If you don't set the text encoding, browsers may return text differently encoded.
Regarding 1): Have a look at PHP's
function described here. Retrieve records from your DB, transcode to the desired encoding and write it back.
与您的客户连接时使用
。在 PHP 中,您将通过使用以下函数来实现此目的:
在执行任何实际数据插入/更新之前
use
when connecting with your client. in PHP you will achieve this by using function:
before you do any actual data insert / update
连接到您的数据库
将连接设置为 UTF-8
SET NAMES 'utf8';
在您的 HTML 文件中:
Connect to your Database
Set connection to UTF-8
SET NAMES 'utf8';
In your HTML files: