MySQL什么时候使用utf-8,什么时候使用latin1?

发布于 2024-10-15 02:54:57 字数 386 浏览 4 评论 0原文

我知道 MySQL 默认使用 latin1 编码,显然需要 1 个字节来存储 latin1 中的字符,需要 3 个字节来存储 latin1 中的字符utf-8 - 这是正确的吗?

我正在开发一个网站,希望该网站能够在全球范围内使用。我绝对需要 utf-8 吗?或者我可以不用使用 latin1 吗?

另外,我尝试将一些表从 latin1 更改为 utf8 但出现此错误: Specificief 密钥太长;最大密钥长度为 1000 字节 有谁知道这个问题的解决办法吗?我真的应该解决这个问题还是 latin1 就足够了?

谢谢, 亚历克斯

I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

I am working on a site that I hope will be used globally. Do I absolutely need to have utf-8? Or will I be able to get away with using latin1?

Also, I tried to change some tables from latin1 to utf8 but I got this error:
Speficief key was too long; max key length is 1000 bytes
Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

Thanks,
Alex

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

琉璃梦幻 2024-10-22 02:54:57

用 latin1 存储一个字符需要 1 个字节,用 utf-8 存储一个字符需要 3 个字节 - 这是正确的吗?

存储 latin1 字符需要 1 字节,存储 UTF8< 需要 13 字节。 /代码> 字符。

如果您在字符串中仅使用基本拉丁字符和标点符号(Unicode 中的 0128),则两个字符集将占用相同的长度。

此外,我尝试将一些表从 latin1 更改为 utf8,但出现此错误:“Specificief key 太长;最大密钥长度为 1000 字节” 有谁知道这个问题的解决方案吗?我真的应该解决这个问题吗?或者 latin1 就足够了吗?

如果您有一列 VARCHAR(334) 或更长,MyISAM 不会让您在其上创建索引,因为该列占用的空间极有可能超过1000 字节。

请注意,这种长度的密钥很少有用。您可以创建一个前缀索引,该索引几乎对任何现实世界的数据都有选择性。

it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

It takes 1 bytes to store a latin1 character and 1 to 3 bytes to store a UTF8 character.

If you only use basic latin characters and punctuation in your strings (0 to 128 in Unicode), both charsets will occupy the same length.

Also, I tried to change some tables from latin1 to utf8 but I got this error: "Speficief key was too long; max key length is 1000 bytes" Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

If you have a column of VARCHAR(334) or longer, MyISAM wont't let you create an index on it since there is remote possibility of the column to occupy more that 1000 bytes.

Note that keys of such length are rarely useful. You can create a prefixed index which will be almost as selective for any real-world data.

西瓜 2024-10-22 02:54:57

我建议至少使用 UTF-8。您的数据将与当今所有其他数据库兼容,因为其中 90% 以上都是 UTF-8。

如果您使用 LATIN1/ISO-8859-1,您可能会面临数据无法正确存储的风险,因为它不支持国际字符...因此您可能会遇到类似此图像左侧的内容:

在此处输入图像描述

如果您使用 UTF-8,则无需处理这些 头痛

关于您的错误,听起来您需要优化数据库。考虑一下:http://bugs.mysql.com/bug.php?id=4541#c284415

如果您提供有关该问题的表架构和列的具体信息,将会有所帮助。

At a bare minimum I would suggest using UTF-8. Your data will be compatible with every other database out there nowadays since 90%+ of them are UTF-8.

If you go with LATIN1/ISO-8859-1 you risk the data being not properly stored because it doesn't support international characters... so you might run into something like the left side of this image:

enter image description here

If you go with UTF-8, you don't need to deal with these headaches.

Regarding your error, it sounds like you need to optimize your database. Consider this: http://bugs.mysql.com/bug.php?id=4541#c284415

It would help if you gave specifics on your table schema and column for that issue.

智商已欠费 2024-10-22 02:54:57

如果您允许用户用自己的语言发帖,并且希望所有国家/地区的用户都参与,则必须至少将包含这些帖子的表切换为 UTF-8 - Latin1 仅涵盖 ASCII 和西欧字符。如果您打算为 UI 使用多种语言,情况也是如此。请参阅这篇文章了解如何处理迁移。

If you allow users to post in their own languages, and if you want users from all countries to participate, you have to switch at least the tables containing those posts to UTF-8 - Latin1 covers only ASCII and western European characters. The same is true if you intend to use multiple languages for your UI. See this post for how to handle migration.

沐歌 2024-10-22 02:54:57

根据我的经验,如果您计划支持阿拉伯语、俄语、亚洲语言或其他语言,那么前期对 UTF-8 支持的投资将会得到回报。不过,根据您的具体情况,您也许可以暂时不用英语。

至于错误,您可能有一个键或索引字段超过 333 个字符,这是采用 UTF-8 编码的 MySQL 允许的最大字符数。请参阅此错误报告

In my experience, if you plan to support Arabic, Russian, Asian languages or others, the investment in UTF-8 support upfront will pay off down the line. However, depending on your circumstances you may be able to get away with English for a while.

As for the error, you probably have a key or index field with more than 333 characters, the maximum allowed in MySQL with UTF-8 encoding. See this bug report.

水水月牙 2024-10-22 02:54:57

当前的最佳实践是永远不要使用 MySQL 的 utf8 字符集。请改用 utf8mb4,这是该标准的正确实现。

有关更多详细信息,请参阅 Adam Hooper 的说明

请注意,在 utf8mb4 中,字符的字节数是可变的。顾名思义,字符最多为四个字节。对于拉丁字符集中的字符,编码为utf8mb4,它们仍然只占用一个字节。其他字符,包括带有重音符号、汉字和表情符号的字符,需要两个、三个或四个字节来存储。

指定的密钥太长;最大键长度为 1000 字节 当索引包含 utf8mb4 中的列时,会发生错误,因为索引可能超过此限制。您需要使用此语法缩短某些字符列的列长度或缩短列上索引的长度,以确保它短于限制。

更改表..添加索引`myIndex` (column1(15),column2(200));

Current best practice is to never use MySQL's utf8 character set. Use utf8mb4 instead, which is a proper implementation of the standard.

See Adam Hooper's Explanation for more detail.

Note that in utf8mb4, characters have a variable number of bytes. As the name implies, characters are up to four bytes. For characters in the the latin character set, encoded as utf8mb4, they still occupy only one byte. Other characters, including those with accents, Kanji, and emoji's require two, three, or four bytes to store.

The Specified key was too long; max key length is 1000 bytes error occurs when an index contains columns in utf8mb4 because the index may be over this limit. You'll need to shorten the column length of some character columns or shorten the length of the index on the columns using this syntax to ensure that it is shorter than the limit.

ALTER TABLE.. ADD INDEX `myIndex` ( column1(15), column2(200) );

孤寂小茶 2024-10-22 02:54:57

我们使用拉丁语做了一个应用程序,因为它是默认的。但后来由于西班牙字符,我们不得不将所有内容更改为 UTF,这不是难以置信的困难,但没有必要进行不必要的更改。

所以简短的答案是从一开始就使用 UTF-8,这会为你省去以后的麻烦。

We did an application using Latin because it was the default. But later on we had to change everything to UTF because of spanish characters, not incredible difficult but no point having to change things unnecessarily.

So short answer is just go with UTF-8 from the beginning, it will save you trouble later on.

锦上情书 2024-10-22 02:54:57

由于密钥的最大长度为 1000 BYTES,如果您使用 utf8,那么这会将您限制为 333 个字符。

然而 MySQL 与 Oracle 的字符集不同。在 Oracle 中,每列不能有不同的字符集,而在 MySQL 中可以,因此您可以将键设置为 latin1,将其他列设置为 utf8。

最后,我相信只有已失效的 6.0alpha 版本(Sun 收购 MySQL 时放弃)可以容纳 BMP(基本多语言计划)之外的 unicode 字符。所以基本上,即使使用 UTF-8,您也不会拥有所有整个 unicode 字符集。实际上,如果这对你来说真的很重要的话,这只是罕见的汉字的问题。

Since the max length of a key is 1000 BYTES, if you use utf8, then this will limmit you to 333 characters.

However MySQL is different form Oracle for charset. In Oracle you can't have a different character set per column, wheras in MySQL you can, so may be you can set the key to latin1 and other columns to utf8.

Finally I believe only defunct version 6.0alpha (ditched when Sun bought MySQL) could accomodate unicode characters beyound the BMP (Basic Multilingual Plan). So basically, even with UTF-8, you won't have all the whole unicode character set. In practice this is only a problem for rare Chinese characters, if that really matters to you.

花期渐远 2024-10-22 02:54:57

我不是专家,但我一直明白UTF-8实际上是4字节宽的编码集,而不是3字节。据我了解,utf8_unicode_ci的MySQL实现仅处理3字节宽的编码集......

如果您想要完整的 UTF-8 4 字节字符编码,则需要对 MySQL 数据库/表使用 utf8mb4_unicode_ci 编码。

I am not an expert, but I always understood that UTF-8 is actually a 4-byte wide encoding set, not 3. And as I understand it, the MySQL implementation of utf8_unicode_ci only handles a 3-byte wide encoding set...

If you want the full UTF-8 4-byte character encoding, you need to use utf8mb4_unicode_ci encoding for your MySQL database/tables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文