为什么MySQL要处理é和 e 一样吗?

发布于 2024-11-27 14:32:28 字数 3091 浏览 2 评论 0原文

我使用 Django Web 应用程序将 Unicode 字符串存储在 MySQL 数据库中。我可以很好地存储 Unicode 数据,但是在查询时,我发现 ée 被视为同一字符:

In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]

In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]

使用 MySQL 时也是如此直接 shell:

mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.01 sec)

这是数据库字符集设置:

mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name            | Value                                                |
+--------------------------+------------------------------------------------------+
| character_set_client     | latin1                                               |
| character_set_connection | latin1                                               |
| character_set_database   | utf8                                                 |
| character_set_filesystem | binary                                               |
| character_set_results    | latin1                                               |
| character_set_server     | latin1                                               |
| character_set_system     | utf8                                                 |
| character_sets_dir       | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+

这是表架构:

CREATE TABLE `auth_user` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `username` varchar(30) CHARACTER SET utf8 NOT NULL,
    `first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `email` varchar(200) CHARACTER SET utf8 NOT NULL,
    `password` varchar(128) CHARACTER SET utf8 NOT NULL,
    `is_staff` tinyint(1) NOT NULL,
    `is_active` tinyint(1) NOT NULL,
    `is_superuser` tinyint(1) NOT NULL,
    `last_login` datetime NOT NULL,
    `date_joined` datetime NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

这是我通过 Django 的 DATABASES 设置传递的选项:

DATABASES = {
    'default': {
        # ...
        'OPTIONS': {
            'charset': 'utf8',
            'init_command': 'SET storage_engine=INNODB;',
        },
    },
}

请注意,我已尝试将表排序规则设置为 utf8_bin ,没有效果:

mysql> alter table auth_user collate utf8_bin;

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

如何让 MySQL 将它们视为不同的字符?

I'm storing Unicode strings in a MySQL database with a Django web application. I can store Unicode data fine, but when querying, I find that é and e are treated as if they were the same character:

In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]

In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]

This is also the case when using the MySQL shell directly:

mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.01 sec)

Here are the database charset settings:

mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name            | Value                                                |
+--------------------------+------------------------------------------------------+
| character_set_client     | latin1                                               |
| character_set_connection | latin1                                               |
| character_set_database   | utf8                                                 |
| character_set_filesystem | binary                                               |
| character_set_results    | latin1                                               |
| character_set_server     | latin1                                               |
| character_set_system     | utf8                                                 |
| character_sets_dir       | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+

here's the table schema:

CREATE TABLE `auth_user` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `username` varchar(30) CHARACTER SET utf8 NOT NULL,
    `first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `email` varchar(200) CHARACTER SET utf8 NOT NULL,
    `password` varchar(128) CHARACTER SET utf8 NOT NULL,
    `is_staff` tinyint(1) NOT NULL,
    `is_active` tinyint(1) NOT NULL,
    `is_superuser` tinyint(1) NOT NULL,
    `last_login` datetime NOT NULL,
    `date_joined` datetime NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

and here are the options I'm passing via Django's DATABASES setting:

DATABASES = {
    'default': {
        # ...
        'OPTIONS': {
            'charset': 'utf8',
            'init_command': 'SET storage_engine=INNODB;',
        },
    },
}

Note that I have tried setting the table collation to utf8_bin, with no effect:

mysql> alter table auth_user collate utf8_bin;

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

How can I get MySQL to treat these as different characters?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半窗疏影 2024-12-04 14:32:28

当您更改表排序规则时,您已经差不多完成了,但还没有完全完成。在 MySQL 中,表中的每一列都有自己的字符集和排序规则。表有自己的字符集和排序规则,但这不会覆盖列排序规则;它仅确定您未指定排序规则的新添加列的排序规则。因此,您尚未更改您感兴趣的列的排序规则。

ALTER TABLE tablename MODIFY columnname
    varchar(???) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL

You were nearly there when you changed the table collation, but not quite. In MySQL, each column in a table has its own character set and collation. The table has its own character set and collation, but this does not override the column collations; it only determines what the collation will be for new columns that are added for which you don't specify the collation. So you haven't changed the collation of the column that you're interested in.

ALTER TABLE tablename MODIFY columnname
    varchar(???) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
美人骨 2024-12-04 14:32:28

您需要设置一个将变音符号视为重要的排序规则。尝试使用utf8_bin

You need to set a collation that treats diacritics as significant. Try using utf8_bin

思慕 2024-12-04 14:32:28

我认为了解表的字符集和您正在查询的字段很重要。

您的问题可以在这里找到答案
http://dev.mysql.com/doc/refman /5.0/en/charset-unicode-sets.html

也许您正在查询的字段具有 utf8_general_ci 字符集。
要获得您想要的内容,您应该将该字段的字符集设置为 utf8_unicode_ci

请记住,正如手册所说,对 utf8_unicode_ci 字符集字段的查询比对 字符集字段的查询慢utf8_general_ci 字段

I think it would be important to know the charset of the table and the field you are querying.

The answer to your question could be found here
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

Maybe the field you are querying has the utf8_general_ci charset.
To obtain what you want you should set the charset of that field as utf8_unicode_ci

Remember that, as the manual says, queries on utf8_unicode_ci charset fields are slower than queries on utf8_general_ci fields

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文