为什么MySQL使用latin1_swedish_ci作为默认值?
有谁知道为什么latin1_swedish是MySQL的默认值。在我看来,UTF-8 会更兼容,对吗?
通常选择默认值是因为它们是最佳的通用选择,但在这种情况下,它们似乎并非如此。
Does anyone know why latin1_swedish is the default for MySQL. It would seem to me that UTF-8 would be more compatible right?
Defaults are usually chosen because they are the best universal choice, but in this case it does not seem thats what they did.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
据我所知,
latin1
是前多字节时代的默认字符集,而且看起来这种情况一直在延续,可能是出于向下兼容性的原因(例如,对于较旧的版本)未指定排序规则的CREATE
语句)。来自这里:
至于为什么是瑞典语,我只能猜测是因为 MySQL AB 是瑞典语。我看不出选择这种排序规则的任何其他原因,它带有一些特定的排序怪癖(我认为 ÖÜ 是在 Z 之后),但它们远未达到国际标准。
As far as I can see,
latin1
was the default character set in pre-multibyte times and it looks like that's been continued, probably for reasons of downward compatibility (e.g. for olderCREATE
statements that didn't specify a collation).From here:
As to why Swedish, I can only guess that it's because MySQL AB is/was Swedish. I can't see any other reason for choosing this collation, it comes with some specific sorting quirks (ÄÖÜ come after Z I think), but they are nowhere near an international standard.
来自
http://dev.mysql.com/doc /refman/5.0/en/charset-we-sets.html
可能会帮助您理解原因。
from
http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html
Might help you understand why.
与多字节编码相比,使用单字节编码具有一些优点,例如,以字节为单位的字符串长度等于以字符为单位的字符串长度。因此,如果您使用 SUBSTRING 这样的函数,则无法直观地清楚您指的是字符还是字节。而且,出于同样的原因,需要对内部代码进行相当大的更改才能支持多字节编码。
Using a single-byte encoding has some advantages over multi-byte encondings, e.g. length of a string in bytes is equal to length of that string in characters. So if you use functions like SUBSTRING it is not intuitively clear if you mean characters or bytes. Also, for the same reasons, it requires quite a big change to the internal code to support multi-byte encodings.
为了扩展为什么不使用 utf8,并解释本线程中其他地方未提及的问题,请注意 mysql utf8 存在一个问题。不是utf8! Mysql 已经存在很长时间了,早在 utf8 出现之前。如上所述,这可能就是它不是默认值的原因(向后可比性以及对第三方软件的期望)。
在 utf8 还很新且不常用的时候,mysql 开发人员似乎添加了基本的 utf8 支持,但错误地使用了 3 个字节的存储空间。既然它已经存在,他们选择不将其增加到 4 字节或将其删除。相反,他们添加了 utf8mb4“多字节 4”,这是真正的 4 字节 utf8。
任何将 mysql 数据库迁移到 utf8 或构建新数据库的人都知道使用 utf8mb4,这一点很重要。有关详细信息,请参阅 https://adamhooper.medium。 com/in-mysql-never-use-utf8-use-utf8mb4-11761243e434
To expand on why not utf8, and explain a gotcha not mentioned elsewhere in this thread be aware there is a gotcha with mysql utf8. It's not utf8! Mysql has been around for a long time, since before utf8 existed. As explained above this is likely why it is not the default (backwards comparability, and expectations of 3rd party software).
In the time when utf8 was new and not commonly used, it seems mysql devs added basic utf8 support, incorrectly using 3 bytes of storage. Now that it exists, they have chosen not to increase it to 4 bytes or remove it. Instead they added utf8mb4 "multi byte 4" which is real 4 byte utf8.
Its important that anyone migrating a mysql database to utf8 or building a new one knows to use utf8mb4. For more information see https://adamhooper.medium.com/in-mysql-never-use-utf8-use-utf8mb4-11761243e434
此类最奇怪的特征都是历史性的。他们很久以前就这样做了,现在他们无法在不破坏某些应用程序的情况下更改它,具体取决于该行为。
也许那时UTF8还不流行。或者MySQL可能不支持多个字节对字符进行编码的字符集。
Most strange features of this kind are historic. They did it like that long time ago, and now they can't change it without breaking some app depending on that behavior.
Perhaps UTF8 wasn't popular then. Or perhaps MySQL didn't support charsets where multiple bytes encode on character then.