如何将“Теє转换为(这是俄语单词)变成可读的东西?
我得到的MySQL DB 包含UTF8 列,其中包含此类“Теє记录。 PHP 的 mb_detect_encoding() 告诉我这是 UTF-8。 我怎样才能把这种“恐怖”变成可读的东西?
谢谢
I got MySQL DB which contains UTF8 column with such "ТеÑ" records.
PHP's mb_detect_encoding() told me that this is UTF-8.
How can I transform this "horror" into something readable?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我猜你已经得到了字节字符串
"\xd0\xa2\xd0\xb5\xd1"
,那么,这将是字符Те< 的 UTF-8 编码形式/code> (加上一个后续字节,即半个字符)。
如果您只是在声明为 UTF-8 的页面上
echo()
,它应该在浏览器上正确显示:这自然也意味着您需要保存
.php
文件本身使用 UTF-8 编码(如果其中包含任何非 ASCII 字符)。(遗憾的是,许多 Windows 文本编辑器默认情况下不会保存为 UTF-8。)如果必须有非 UTF -8 页,您必须使用
iconv()
将字符串转换为您使用的任何编码,大概是俄语的 Windows 代码页 1251 ('cp1251'
)。但我强烈建议自始至终都使用 UTF-8。编辑回复评论:
mysql_set_charset('utf8')
确实是正确的做法。检查您是否包含了上面的meta
,并且浏览器是否可以看到它(检查 View->Encoding is UTF-8)。如果您在正确发送 UTF-8 的情况下仍收到
ТеÑ
,那么恐怕您数据库的当前内容已经混乱。也许之前插入的数据没有正确的mysql_set_charset
调用,或者您执行的 SQL 导入使用了错误的字符集。如果是这种情况,您可能需要遍历数据库的每一行并使用
iconv()
将 UTF-8 转换为 ISO-8859-1 来“修复”它。这应该撤消双 UTF-8 编码。[编辑:2]
好的,所以输入不是有效的 UTF-8 序列。这可能是因为您根本没有从数据库中获取 UTF-8,或者是因为 UTF-8 序列已被截断。例如,您的字符串"Ñ" 只是两字节 UTF-8 序列的一半。作为 UTF-8 在浏览器中它将呈现为
"\xd0\xa2\xd0\xb5\xd1"
(读作 ISO-8859-1,看起来像"ТеÑ"
),无效,因为最终的Те�
。如果这就是您数据库中的数据,您需要先修复其中的数据,然后才能继续。
您尚未确认您是否正确发送 UTF-8并且浏览器知道这一点(通过检查 View->Encoding),因此当您
echo()
; 时在屏幕上看到的内容并没有真正的意义。我们无法从中找出原始字节字符串是什么。告诉我们您在
echo bin2hex($row['name']);
时看到的内容。这会将字符串中的每个字节转换为十六进制数字,因此"\xd0\xa2\xd0\xb5\xd1"
将显示为d0a2d0b5d1
(如果您是这样的话)我得到了。I'm guessing you've got the byte string
"\xd0\xa2\xd0\xb5\xd1"
, then, which would be the UTF-8 encoded form of the charactersТе
(plus one following byte which is half a character).If you merely
echo()
that on a page that you have declared as being UTF-8, it should display correctly on the browser:This naturally also means you will need to save the
.php
file itself using the UTF-8 encoding, if it has any non-ASCII characters in. (Many Windows text editors tend not to save as UTF-8 by default, sadly.)If you must have a non-UTF-8 page, you would have to using
iconv()
to convert the string to whatever encoding you were using, presumably Windows code page 1251 for Russian ('cp1251'
). But I would strongly recommend using UTF-8 for everything all the way through.edit re comment:
mysql_set_charset('utf8')
is indeed the right thing to do. Check you are including themeta
as above, and that the browser is seeing it (check View->Encoding is UTF-8).If you are getting
ТеÑ
even with UTF-8 correctly getting sent, then I'm afraid the current contents of your database are messed up. Perhaps data had been inserted previously without the correctmysql_set_charset
call, or maybe you did an SQL import that used the wrong charset.If this is the case, you're likely going to have to go through each row of the database and ‘fix’ it by using
iconv()
to convert UTF-8 to ISO-8859-1. This should undo the double-UTF-8-encoding.[edit:2]
OK, so the input isn't a valid UTF-8 sequence. That could either be because you're not getting UTF-8 out of the database after all, or because a UTF-8 sequence has become truncated. For example your string
"\xd0\xa2\xd0\xb5\xd1"
(which, read as ISO-8859-1, looks like"ТеÑ"
), is not valid, as the final"Ñ"
is only half of a two-byte UTF-8 sequence. As UTF-8 in a browser it would render asТе�
.If that's what you have in your database you'll need to fix the data in there before you can proceed.
You haven't confirmed that you are correctly sending UTF-8 and that the browser knows this (by checking View->Encoding), so it's not really meaningful what you see on-screen when you
echo()
; we can't work out what the original byte string was from that.Tell us what you see when you
echo bin2hex($row['name']);
. This will convert each byte in the string into hex digits, so"\xd0\xa2\xd0\xb5\xd1"
would come out asd0a2d0b5d1
, if that's what you've got.输出到指定UTF8编码的页面。浏览器将以可读的形式显示它。
output to page with UTF8 encoding specified. browser will show it in readable form.