为什么MySQL要处理é和 e 一样吗?
我使用 Django Web 应用程序将 Unicode 字符串存储在 MySQL 数据库中。我可以很好地存储 Unicode 数据,但是在查询时,我发现 é
和 e
被视为同一字符:
In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]
In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]
使用 MySQL 时也是如此直接 shell:
mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.01 sec)
这是数据库字符集设置:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
这是表架构:
CREATE TABLE `auth_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) CHARACTER SET utf8 NOT NULL,
`first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`email` varchar(200) CHARACTER SET utf8 NOT NULL,
`password` varchar(128) CHARACTER SET utf8 NOT NULL,
`is_staff` tinyint(1) NOT NULL,
`is_active` tinyint(1) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`last_login` datetime NOT NULL,
`date_joined` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
这是我通过 Django 的 DATABASES 设置传递的选项:
DATABASES = {
'default': {
# ...
'OPTIONS': {
'charset': 'utf8',
'init_command': 'SET storage_engine=INNODB;',
},
},
}
请注意,我已尝试将表排序规则设置为 utf8_bin ,没有效果:
mysql> alter table auth_user collate utf8_bin;
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
如何让 MySQL 将它们视为不同的字符?
I'm storing Unicode strings in a MySQL database with a Django web application. I can store Unicode data fine, but when querying, I find that é
and e
are treated as if they were the same character:
In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]
In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]
This is also the case when using the MySQL shell directly:
mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.01 sec)
Here are the database charset settings:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
here's the table schema:
CREATE TABLE `auth_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) CHARACTER SET utf8 NOT NULL,
`first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`email` varchar(200) CHARACTER SET utf8 NOT NULL,
`password` varchar(128) CHARACTER SET utf8 NOT NULL,
`is_staff` tinyint(1) NOT NULL,
`is_active` tinyint(1) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`last_login` datetime NOT NULL,
`date_joined` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and here are the options I'm passing via Django's DATABASES
setting:
DATABASES = {
'default': {
# ...
'OPTIONS': {
'charset': 'utf8',
'init_command': 'SET storage_engine=INNODB;',
},
},
}
Note that I have tried setting the table collation to utf8_bin
, with no effect:
mysql> alter table auth_user collate utf8_bin;
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
How can I get MySQL to treat these as different characters?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当您更改表排序规则时,您已经差不多完成了,但还没有完全完成。在 MySQL 中,表中的每一列都有自己的字符集和排序规则。表有自己的字符集和排序规则,但这不会覆盖列排序规则;它仅确定您未指定排序规则的新添加列的排序规则。因此,您尚未更改您感兴趣的列的排序规则。
You were nearly there when you changed the table collation, but not quite. In MySQL, each column in a table has its own character set and collation. The table has its own character set and collation, but this does not override the column collations; it only determines what the collation will be for new columns that are added for which you don't specify the collation. So you haven't changed the collation of the column that you're interested in.
您需要设置一个将变音符号视为重要的排序规则。尝试使用
utf8_bin
You need to set a collation that treats diacritics as significant. Try using
utf8_bin
我认为了解表的字符集和您正在查询的字段很重要。
您的问题可以在这里找到答案
http://dev.mysql.com/doc/refman /5.0/en/charset-unicode-sets.html
也许您正在查询的字段具有 utf8_general_ci 字符集。
要获得您想要的内容,您应该将该字段的字符集设置为 utf8_unicode_ci
请记住,正如手册所说,对 utf8_unicode_ci 字符集字段的查询比对 字符集字段的查询慢utf8_general_ci 字段
I think it would be important to know the charset of the table and the field you are querying.
The answer to your question could be found here
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
Maybe the field you are querying has the utf8_general_ci charset.
To obtain what you want you should set the charset of that field as utf8_unicode_ci
Remember that, as the manual says, queries on utf8_unicode_ci charset fields are slower than queries on utf8_general_ci fields