Mysql 保加利亚语、字符集
我有一个包含多种语言的 Mysql 表,一种语言一个字段。
我的字符集是 utf_general_ci
当我使用 phpMyAdmin 查看表时,我有一个保加利亚语页面,如下所示:
За наÑ
这是一个标题。同样的标题出现在网站上,如下所示:
За нас (this is correct)
我做错了什么?
I have a Mysql table with multiple languages, one language a field.
My character set is utf_general_ci
When I look into the table with phpMyAdmin I have a bulgarian page which looks like this:
За наÑ
This is a title. This same title shows up in the website like this:
За нас (this is correct)
What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好的,尝试在实际获取记录之前执行这些查询:
然后继续执行查询。当然,上述查询必须在当前数据库连接的上下文中。
OK, try to execute these queries before your actual fetching of the records:
Afterwards proceed with execution of your queries. The above queries, if course, must be in context of your current database connection.
这看起来数据是 UTF-8 编码的,因此在声明为 UTF-8 编码的网页上运行良好,但当程序无法处理或尚未设置为应用 UTF-8 时则不然。
例如,出现两次的字符 °Ñ 是 U+00B0 U+00D1。字节0xB0和0xD1是西里尔小写字母a,U+0430的UTF-8形式,它出现在正确文本中的相应位置。因此,显然 UTF-8 数据根据 ISO-8859-1、Windows-1252 或某些类似的 8 位编码被误解。
This looks like the data is UTF-8 encoded and hence works well on a web page declared as UTF-8 encoded but not when a program cannot handle or has not been set to apply UTF-8.
For example, the characters °Ñ that occur twice are U+00B0 U+00D1. The bytes 0xB0 and 0xD1 are the UTF-8 form of the cyrillic small letter a, U+0430, which appears in the corresponding positions in the correct text. So apparently UTF-8 data is being misinterpreted according to ISO-8859-1, Windows-1252, or some similar 8-bit encoding.
您表中的字段使用什么字符集?
您能否分享这些字段的 SHOW CREATE TABLE 命令的相关部分?
由于 ISO-8859-1 是 mysql 的默认数据库字符集,并且它大多数情况下不进行任何转换,因此人们将其用作 BINARY 并仅将 UTF-8 编码的西里尔字母存储到其中。这适用于 Web 开发工具,因为它们绑定到字段并以 UTF-8 编码的二进制字节形式接收数据,然后不进行转换,将其放入网页中,该网页显示其输出使用 utf-8 编码。因此数据只是通过而没有被正确编码以供数据库使用。当然,当您在数据库内执行操作时,这会导致各种问题(例如获取字符与字节长度并尝试正确排序)。但对于基本的存储/检索操作来说,它看起来很有效。
对于非本地化 Web 应用程序来说,这是一种非常典型的行为,这些应用程序假设它们最多使用 ASCII 或 ISO-8859-1。
解决方法是使用 UTF-8 编码创建一组新表,然后将错误编码的 utf-8 数据显式转码为宽字符,然后将它们放入 utf-8 表中,以便数据库知道正确的编码用过的。
What character set do the fields in your table use ?
Can you please share the relevant part of the SHOW CREATE TABLE command for these fields ?
Since ISO-8859-1 is the default database charset for mysql and it's mostly not doing any conversions people use it as BINARY and just store UTF-8 encoded Cyrillic into it. This works well with web development tools, because they bind to the field and receive the data as UTF-8 encoded binary bytes and then, without conversion, put it in a web page that says it uses utf-8 encoding for its output. So data just pass through without being properly encoded for the database to use. Of course this causes all kinds of problems when you do operations inside the database (e.g. get the character vs. byte length and try to sort properly). But for basic store/retrieve operations it looks like it's working.
This is a very typical behavior for non-localized web apps that assume they're working with ASCII or ISO-8859-1 at most.
The remedy to that is to create new set of tables using the UTF-8 encoding and then explicitly transcode the wrongly encoded utf-8 data to wide chars and then put these into the utf-8 table so the database is aware of the right encoding used.