MySQL 之间的比较和“ae”

发布于 2025-01-11 10:37:49 字数 594 浏览 0 评论 0原文

我的 MySQL 服务器在将数据存储到数据库时无法识别字符“æ”和“ae”之间的区别,这给我带来了一些问题。我的目标是找到一个可以识别这些字符之间差异的字符集,我找到了它(utfmb3),但它将被弃用,并且新的替代方案(utfmb4)不会将这些字符识别为不同的。

我尝试过的:

set names 'utf8mb3';
select 'æ' = 'ae';

此选择返回 0 (false),这意味着该字符集将它们视为不同的字符,而这正是我所需要的,但 MySQL 给了我一个警告: “utf8mb3”已弃用,并将在未来版本中删除。请使用utf8mb4代替

但是当我这样做时

set names 'utf8mb4';
select 'æ' = 'ae';

此选择返回1,这意味着utf8mb4将这些视为相同的字符,这不好..

所以,我的困境是,使用什么字符集? 如果我使用 utfmb3,它很快就会被弃用,这可不好。如果我使用 utfmb4,则无法正常工作。

My MySQL server doesn't recognize the difference between characters 'æ' and 'ae' while storing data to database and that creates some problems for me. My goal is to find a charset which recognizes the difference between those characters, and I found it (utfmb3), but it is going to be deprecated, and the new alternative (utfmb4) doesn't recognize those characters as different.

What I've tried:

set names 'utf8mb3';
select 'æ' = 'ae';

This select returns 0 (false), which means this charset sees these as different characters, and that's just what I need, but MySQL gives me a warning:
'utf8mb3' is deprecated and will be removed in a future release. Please use utf8mb4 instead

But when I do

set names 'utf8mb4';
select 'æ' = 'ae';

This select returns 1, which means utf8mb4 sees these as the same characters, which is not good..

So, my dilema is, what charset to use?
If I use utfmb3, it will be deprecated soon, that's no good. If I use utfmb4, that won't work correctly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

向地狱狂奔 2025-01-18 10:37:49

WHERE 子句中的 =LIKE 比较应用排序规则(不仅仅是字符集)来确定这种相等性。该语句对于前两个排序规则返回 0,对于后两个排序规则返回 1。

SELECT 'æ' = 'ae' COLLATE utf8mb4_unicode_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_general_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_unicode_520_ci,   -- 1
       'æ' = 'ae' COLLATE utf8mb4_german2_ci        -- 1

您的默认排序规则似乎是最后两个排序规则之一或其他一些以您不希望的方式处理相等测试的排序规则。

您可以使用此语句查看连接的排序规则设置。我怀疑它是utf8mb4_unicode_520_ci

SELECT @@collation_connection;

请务必使用您想要的排序规则来定义列的排序规则,并将连接排序规则设置为相同的值。 utf8mb4_unicode_ci 是合适的。试试这个。

SET collation_connection = 'utf8mb4_unicode_ci';
SELECT 'æ' = 'ae'   -- 0;

如果不能更好地了解您的语言要求,就很难提供更具体的建议。

更多信息请参见:之间的差异MariaDB/MySQL 中的 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则?

= and LIKE comparisons in WHERE clauses apply a collation (not just a character set) to determine this kind of equality. This statement returns zero for the first two collations and one for the second two.

SELECT 'æ' = 'ae' COLLATE utf8mb4_unicode_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_general_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_unicode_520_ci,   -- 1
       'æ' = 'ae' COLLATE utf8mb4_german2_ci        -- 1

It seems likely your default collation is one of the last two or some other collation that handles that equality test the way you don't want it.

You can see your connection's collation setting with this statement. I suspect it is utf8mb4_unicode_520_ci.

SELECT @@collation_connection;

Be sure to define the collation for your columns with one you do want, and set your connection collation to the same thing. utf8mb4_unicode_ci is suitable. Try this.

SET collation_connection = 'utf8mb4_unicode_ci';
SELECT 'æ' = 'ae'   -- 0;

It's hard to give more specific advice without understanding your linguistic requirements better.

More info here: Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations in MariaDB/MySQL?

就是爱搞怪 2025-01-18 10:37:49

联盟“utf8mb4_unicode_ci”是您当前要使用的联盟。确保您将客户端(即 php、node.python)设置为使用正确的字符集(在数据库客户端对象和环境配置中)。

Coalition 'utf8mb4_unicode_ci' is the current one you want to use. Make sure you're setting your client (ie php, node. python) to use the correct charset as well (both in the db client object and the environment config).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文