为什么MySQL使用latin1_swedish_ci作为默认值?

发布于 2024-09-27 15:45:06 字数 105 浏览 4 评论 0原文

有谁知道为什么latin1_swedish是MySQL的默认值。在我看来,UTF-8 会更兼容,对吗?

通常选择默认值是因为它们是最佳的通用选择,但在这种情况下,它们似乎并非如此。

Does anyone know why latin1_swedish is the default for MySQL. It would seem to me that UTF-8 would be more compatible right?

Defaults are usually chosen because they are the best universal choice, but in this case it does not seem thats what they did.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

最单纯的乌龟 2024-10-04 15:45:06

据我所知,latin1 是前多字节时代的默认字符集,而且看起来这种情况一直在延续,可能是出于向下兼容性的原因(例如,对于较旧的版本)未指定排序规则的 CREATE 语句)。

来自这里

4.0 做了什么

MySQL 4.0(及更早版本)仅支持字符集和排序规则与单字节字符编码的组合概念,这是在服务器级别指定的。默认为 latin1,对应 MySQL 4.1 中的 latin1 字符集和 latin1_swedish_ci 排序规则。

至于为什么是瑞典语,我只能猜测是因为 MySQL AB 是瑞典语。我看不出选择这种排序规则的任何其他原因,它带有一些特定的排序怪癖(我认为 ÖÜ 是在 Z 之后),但它们远未达到国际标准。

As far as I can see, latin1 was the default character set in pre-multibyte times and it looks like that's been continued, probably for reasons of downward compatibility (e.g. for older CREATE statements that didn't specify a collation).

From here:

What 4.0 Did

MySQL 4.0 (and earlier versions) only supported what amounted to a combined notion of the character set and collation with single-byte character encodings, which was specified at the server level. The default was latin1, which corresponds to a character set of latin1 and collation of latin1_swedish_ci in MySQL 4.1.

As to why Swedish, I can only guess that it's because MySQL AB is/was Swedish. I can't see any other reason for choosing this collation, it comes with some specific sorting quirks (ÄÖÜ come after Z I think), but they are nowhere near an international standard.

娇纵 2024-10-04 15:45:06

latin1 是默认字符集。 MySQL的latin1是一样的
Windows cp1252 字符集。这意味着它与
官方 ISO 8859-1 或 IANA(互联网号码分配机构)
latin1,但 IANA latin1 处理 0x80 之间的代码点
0x9f 表示“未定义”,而 cp1252 以及 MySQL 的 latin1,
为这些位置分配字符。

来自

http://dev.mysql.com/doc /refman/5.0/en/charset-we-sets.html

可能会帮助您理解原因。

latin1 is the default character set. MySQL's latin1 is the same as the
Windows cp1252 character set. This means it is the same as the
official ISO 8859-1 or IANA (Internet Assigned Numbers Authority)
latin1, except that IANA latin1 treats the code points between 0x80
and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1,
assign characters for those positions.

from

http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

Might help you understand why.

如果没有你 2024-10-04 15:45:06

与多字节编码相比,使用单字节编码具有一些优点,例如,以字节为单位的字符串长度等于以字符为单位的字符串长度。因此,如果您使用 SUBSTRING 这样的函数,则无法直观地清楚您指的是字符还是字节。而且,出于同样的原因,需要对内部代码进行相当大的更改才能支持多字节编码。

Using a single-byte encoding has some advantages over multi-byte encondings, e.g. length of a string in bytes is equal to length of that string in characters. So if you use functions like SUBSTRING it is not intuitively clear if you mean characters or bytes. Also, for the same reasons, it requires quite a big change to the internal code to support multi-byte encodings.

童话 2024-10-04 15:45:06

为了扩展为什么不使用 utf8,并解释本线程中其他地方未提及的问题,请注意 mysql utf8 存在一个问题。不是utf8! Mysql 已经存在很长时间了,早在 utf8 出现之前。如上所述,这可能就是它不是默认值的原因(向后可比性以及对第三方软件的期望)。

在 utf8 还很新且不常用的时候,mysql 开发人员似乎添加了基本的 utf8 支持,但错误地使用了 3 个字节的存储空间。既然它已经存在,他们选择不将其增加到 4 字节或将其删除。相反,他们添加了 utf8mb4“多字节 4”,这是真正的 4 字节 utf8。

任何将 mysql 数据库迁移到 utf8 或构建新数据库的人都知道使用 utf8mb4,这一点很重要。有关详细信息,请参阅 https://adamhooper.medium。 com/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

To expand on why not utf8, and explain a gotcha not mentioned elsewhere in this thread be aware there is a gotcha with mysql utf8. It's not utf8! Mysql has been around for a long time, since before utf8 existed. As explained above this is likely why it is not the default (backwards comparability, and expectations of 3rd party software).

In the time when utf8 was new and not commonly used, it seems mysql devs added basic utf8 support, incorrectly using 3 bytes of storage. Now that it exists, they have chosen not to increase it to 4 bytes or remove it. Instead they added utf8mb4 "multi byte 4" which is real 4 byte utf8.

Its important that anyone migrating a mysql database to utf8 or building a new one knows to use utf8mb4. For more information see https://adamhooper.medium.com/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

一指流沙 2024-10-04 15:45:06

此类最奇怪的特征都是历史性的。他们很久以前就这样做了,现在他们无法在不破坏某些应用程序的情况下更改它,具体取决于该行为。

也许那时UTF8还不流行。或者MySQL可能不支持多个字节对字符进行编码的字符集。

Most strange features of this kind are historic. They did it like that long time ago, and now they can't change it without breaking some app depending on that behavior.

Perhaps UTF8 wasn't popular then. Or perhaps MySQL didn't support charsets where multiple bytes encode on character then.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文