MySQL 之间的比较和“ae”
我的 MySQL 服务器在将数据存储到数据库时无法识别字符“æ”和“ae”之间的区别,这给我带来了一些问题。我的目标是找到一个可以识别这些字符之间差异的字符集,我找到了它(utfmb3),但它将被弃用,并且新的替代方案(utfmb4)不会将这些字符识别为不同的。
我尝试过的:
set names 'utf8mb3';
select 'æ' = 'ae';
此选择返回 0
(false),这意味着该字符集将它们视为不同的字符,而这正是我所需要的,但 MySQL 给了我一个警告: “utf8mb3”已弃用,并将在未来版本中删除。请使用utf8mb4代替
但是当我这样做时
set names 'utf8mb4';
select 'æ' = 'ae';
此选择返回1
,这意味着utf8mb4
将这些视为相同的字符,这不好..
所以,我的困境是,使用什么字符集? 如果我使用 utfmb3
,它很快就会被弃用,这可不好。如果我使用 utfmb4
,则无法正常工作。
My MySQL server doesn't recognize the difference between characters 'æ' and 'ae' while storing data to database and that creates some problems for me. My goal is to find a charset which recognizes the difference between those characters, and I found it (utfmb3), but it is going to be deprecated, and the new alternative (utfmb4) doesn't recognize those characters as different.
What I've tried:
set names 'utf8mb3';
select 'æ' = 'ae';
This select returns 0
(false), which means this charset sees these as different characters, and that's just what I need, but MySQL gives me a warning:
'utf8mb3' is deprecated and will be removed in a future release. Please use utf8mb4 instead
But when I do
set names 'utf8mb4';
select 'æ' = 'ae';
This select returns 1
, which means utf8mb4
sees these as the same characters, which is not good..
So, my dilema is, what charset to use?
If I use utfmb3
, it will be deprecated soon, that's no good. If I use utfmb4
, that won't work correctly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
WHERE 子句中的
=
和LIKE
比较应用排序规则(不仅仅是字符集)来确定这种相等性。该语句对于前两个排序规则返回 0,对于后两个排序规则返回 1。您的默认排序规则似乎是最后两个排序规则之一或其他一些以您不希望的方式处理相等测试的排序规则。
您可以使用此语句查看连接的排序规则设置。我怀疑它是
utf8mb4_unicode_520_ci
。请务必使用您想要的排序规则来定义列的排序规则,并将连接排序规则设置为相同的值。
utf8mb4_unicode_ci
是合适的。试试这个。如果不能更好地了解您的语言要求,就很难提供更具体的建议。
更多信息请参见:之间的差异MariaDB/MySQL 中的 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则?
=
andLIKE
comparisons in WHERE clauses apply a collation (not just a character set) to determine this kind of equality. This statement returns zero for the first two collations and one for the second two.It seems likely your default collation is one of the last two or some other collation that handles that equality test the way you don't want it.
You can see your connection's collation setting with this statement. I suspect it is
utf8mb4_unicode_520_ci
.Be sure to define the collation for your columns with one you do want, and set your connection collation to the same thing.
utf8mb4_unicode_ci
is suitable. Try this.It's hard to give more specific advice without understanding your linguistic requirements better.
More info here: Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations in MariaDB/MySQL?
联盟“utf8mb4_unicode_ci”是您当前要使用的联盟。确保您将客户端(即 php、node.python)设置为使用正确的字符集(在数据库客户端对象和环境配置中)。
Coalition 'utf8mb4_unicode_ci' is the current one you want to use. Make sure you're setting your client (ie php, node. python) to use the correct charset as well (both in the db client object and the environment config).