MySQL什么时候使用utf-8,什么时候使用latin1?
我知道 MySQL 默认使用 latin1 编码,显然需要 1 个字节来存储 latin1 中的字符,需要 3 个字节来存储 latin1 中的字符utf-8 - 这是正确的吗?
我正在开发一个网站,希望该网站能够在全球范围内使用。我绝对需要 utf-8 吗?或者我可以不用使用 latin1 吗?
另外,我尝试将一些表从 latin1 更改为 utf8 但出现此错误: Specificief 密钥太长;最大密钥长度为 1000 字节
有谁知道这个问题的解决办法吗?我真的应该解决这个问题还是 latin1 就足够了?
谢谢, 亚历克斯
I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?
I am working on a site that I hope will be used globally. Do I absolutely need to have utf-8? Or will I be able to get away with using latin1?
Also, I tried to change some tables from latin1 to utf8 but I got this error:Speficief key was too long; max key length is 1000 bytes
Does anyone know the solution to this? And should I really solve that or may latin1 be enough?
Thanks,
Alex
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
存储
latin1
字符需要1
字节,存储UTF8< 需要
1
到3
字节。 /代码> 字符。如果您在字符串中仅使用基本拉丁字符和标点符号(
Unicode
中的0
到128
),则两个字符集将占用相同的长度。如果您有一列
VARCHAR(334)
或更长,MyISAM
不会让您在其上创建索引,因为该列占用的空间极有可能超过1000
字节。请注意,这种长度的密钥很少有用。您可以创建一个前缀索引,该索引几乎对任何现实世界的数据都有选择性。
It takes
1
bytes to store alatin1
character and1
to3
bytes to store aUTF8
character.If you only use basic latin characters and punctuation in your strings (
0
to128
inUnicode
), both charsets will occupy the same length.If you have a column of
VARCHAR(334)
or longer,MyISAM
wont't let you create an index on it since there is remote possibility of the column to occupy more that1000
bytes.Note that keys of such length are rarely useful. You can create a prefixed index which will be almost as selective for any real-world data.
我建议至少使用 UTF-8。您的数据将与当今所有其他数据库兼容,因为其中 90% 以上都是 UTF-8。
如果您使用 LATIN1/ISO-8859-1,您可能会面临数据无法正确存储的风险,因为它不支持国际字符...因此您可能会遇到类似此图像左侧的内容:
如果您使用 UTF-8,则无需处理这些 头痛。
关于您的错误,听起来您需要优化数据库。考虑一下:http://bugs.mysql.com/bug.php?id=4541#c284415
如果您提供有关该问题的表架构和列的具体信息,将会有所帮助。
At a bare minimum I would suggest using UTF-8. Your data will be compatible with every other database out there nowadays since 90%+ of them are UTF-8.
If you go with LATIN1/ISO-8859-1 you risk the data being not properly stored because it doesn't support international characters... so you might run into something like the left side of this image:
If you go with UTF-8, you don't need to deal with these headaches.
Regarding your error, it sounds like you need to optimize your database. Consider this: http://bugs.mysql.com/bug.php?id=4541#c284415
It would help if you gave specifics on your table schema and column for that issue.
如果您允许用户用自己的语言发帖,并且希望所有国家/地区的用户都参与,则必须至少将包含这些帖子的表切换为 UTF-8 - Latin1 仅涵盖 ASCII 和西欧字符。如果您打算为 UI 使用多种语言,情况也是如此。请参阅这篇文章了解如何处理迁移。
If you allow users to post in their own languages, and if you want users from all countries to participate, you have to switch at least the tables containing those posts to UTF-8 - Latin1 covers only ASCII and western European characters. The same is true if you intend to use multiple languages for your UI. See this post for how to handle migration.
根据我的经验,如果您计划支持阿拉伯语、俄语、亚洲语言或其他语言,那么前期对 UTF-8 支持的投资将会得到回报。不过,根据您的具体情况,您也许可以暂时不用英语。
至于错误,您可能有一个键或索引字段超过 333 个字符,这是采用 UTF-8 编码的 MySQL 允许的最大字符数。请参阅此错误报告。
In my experience, if you plan to support Arabic, Russian, Asian languages or others, the investment in UTF-8 support upfront will pay off down the line. However, depending on your circumstances you may be able to get away with English for a while.
As for the error, you probably have a key or index field with more than 333 characters, the maximum allowed in MySQL with UTF-8 encoding. See this bug report.
当前的最佳实践是永远不要使用 MySQL 的
utf8
字符集。请改用utf8mb4
,这是该标准的正确实现。有关更多详细信息,请参阅 Adam Hooper 的说明 。
请注意,在
utf8mb4
中,字符的字节数是可变的。顾名思义,字符最多为四个字节。对于拉丁字符集中的字符,编码为utf8mb4
,它们仍然只占用一个字节。其他字符,包括带有重音符号、汉字和表情符号的字符,需要两个、三个或四个字节来存储。指定的密钥太长;最大键长度为 1000 字节
当索引包含utf8mb4
中的列时,会发生错误,因为索引可能超过此限制。您需要使用此语法缩短某些字符列的列长度或缩短列上索引的长度,以确保它短于限制。更改表..添加索引`myIndex` (column1(15),column2(200));
Current best practice is to never use MySQL's
utf8
character set. Useutf8mb4
instead, which is a proper implementation of the standard.See Adam Hooper's Explanation for more detail.
Note that in
utf8mb4
, characters have a variable number of bytes. As the name implies, characters are up to four bytes. For characters in the the latin character set, encoded asutf8mb4
, they still occupy only one byte. Other characters, including those with accents, Kanji, and emoji's require two, three, or four bytes to store.The
Specified key was too long; max key length is 1000 bytes
error occurs when an index contains columns inutf8mb4
because the index may be over this limit. You'll need to shorten the column length of some character columns or shorten the length of the index on the columns using this syntax to ensure that it is shorter than the limit.ALTER TABLE.. ADD INDEX `myIndex` ( column1(15), column2(200) );
我们使用拉丁语做了一个应用程序,因为它是默认的。但后来由于西班牙字符,我们不得不将所有内容更改为 UTF,这不是难以置信的困难,但没有必要进行不必要的更改。
所以简短的答案是从一开始就使用 UTF-8,这会为你省去以后的麻烦。
We did an application using Latin because it was the default. But later on we had to change everything to UTF because of spanish characters, not incredible difficult but no point having to change things unnecessarily.
So short answer is just go with UTF-8 from the beginning, it will save you trouble later on.
由于密钥的最大长度为 1000 BYTES,如果您使用 utf8,那么这会将您限制为 333 个字符。
然而 MySQL 与 Oracle 的字符集不同。在 Oracle 中,每列不能有不同的字符集,而在 MySQL 中可以,因此您可以将键设置为 latin1,将其他列设置为 utf8。
最后,我相信只有已失效的 6.0alpha 版本(Sun 收购 MySQL 时放弃)可以容纳 BMP(基本多语言计划)之外的 unicode 字符。所以基本上,即使使用 UTF-8,您也不会拥有所有整个 unicode 字符集。实际上,如果这对你来说真的很重要的话,这只是罕见的汉字的问题。
Since the max length of a key is 1000 BYTES, if you use utf8, then this will limmit you to 333 characters.
However MySQL is different form Oracle for charset. In Oracle you can't have a different character set per column, wheras in MySQL you can, so may be you can set the key to latin1 and other columns to utf8.
Finally I believe only defunct version 6.0alpha (ditched when Sun bought MySQL) could accomodate unicode characters beyound the BMP (Basic Multilingual Plan). So basically, even with UTF-8, you won't have all the whole unicode character set. In practice this is only a problem for rare Chinese characters, if that really matters to you.
我不是专家,但我一直明白UTF-8实际上是4字节宽的编码集,而不是3字节。据我了解,utf8_unicode_ci的MySQL实现仅处理3字节宽的编码集......
如果您想要完整的 UTF-8 4 字节字符编码,则需要对 MySQL 数据库/表使用 utf8mb4_unicode_ci 编码。
I am not an expert, but I always understood that UTF-8 is actually a 4-byte wide encoding set, not 3. And as I understand it, the MySQL implementation of utf8_unicode_ci only handles a 3-byte wide encoding set...
If you want the full UTF-8 4-byte character encoding, you need to use utf8mb4_unicode_ci encoding for your MySQL database/tables.