如何将“Теє转换为(这是俄语单词)变成可读的东西?

发布于 2024-09-08 15:54:33 字数 118 浏览 9 评论 0原文

我得到的MySQL DB 包含UTF8 列,其中包含此类“Теє记录。 PHP 的 mb_detect_encoding() 告诉我这是 UTF-8。 我怎样才能把这种“恐怖”变成可读的东西?

谢谢

I got MySQL DB which contains UTF8 column with such "ТеÑ" records.
PHP's mb_detect_encoding() told me that this is UTF-8.
How can I transform this "horror" into something readable?

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甜心 2024-09-15 15:54:33

我猜你已经得到了字节字符串 "\xd0\xa2\xd0\xb5\xd1",那么,这将是字符 Те< 的 UTF-8 编码形式/code> (加上一个后续字节,即半个字符)。

如果您只是在声明为 UTF-8 的页面上 echo() ,它应该在浏览器上正确显示:

 <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
 ...

 something: <?php echo htmlspecialchars($something); ?>

这自然也意味着您需要保存 .php 文件本身使用 UTF-8 编码(如果其中包含任何非 ASCII 字符)。(遗憾的是,许多 Windows 文本编辑器默认情况下不会保存为 UTF-8。)

如果必须有非 UTF -8 页,您必须使用 iconv() 将字符串转换为您使用的任何编码,大概是俄语的 Windows 代码页 1251 ('cp1251')。但我强烈建议自始至终都使用 UTF-8。

编辑回复评论:

如果我在选择行之前执行 mysql_set_charset("utf8", $db) - 我会得到这个“恐怖”

mysql_set_charset('utf8') 确实是正确的做法。检查您是否包含了上面的 meta,并且浏览器是否可以看到它(检查 View->Encoding is UTF-8)。

如果您在正确发送 UTF-8 的情况下仍收到 Ð¢ÐµÑ ,那么恐怕您数据库的当前内容已经混乱。也许之前插入的数据没有正确的 mysql_set_charset 调用,或者您执行的 SQL 导入使用了错误的字符集。

如果是这种情况,您可能需要遍历数据库的每一行并使用 iconv() 将 UTF-8 转换为 ISO-8859-1 来“修复”它。这应该撤消双 UTF-8 编码。

[编辑:2]

iconv("UTF-8", "ISO-8859-1", $row['name']) 表示注意: iconv(): 在输入字符串中检测到非法字符。

好的,所以输入不是有效的 UTF-8 序列。这可能是因为您根本没有从数据库中获取 UTF-8,或者是因为 UTF-8 序列已被截断。例如,您的字符串 "\xd0\xa2\xd0\xb5\xd1" (读作 ISO-8859-1,看起来像 "ТеÑ"),无效,因为最终的"Ñ" 只是两字节 UTF-8 序列的一半。作为 UTF-8 在浏览器中它将呈现为 Те�

如果这就是您数据库中的数据,您需要先修复其中的数据,然后才能继续。

如果我回显 $row['name'] 而不执行 mysql_set_charset("utf8", $db)

您尚未确认您是否正确发送 UTF-8并且浏览器知道这一点(通过检查 View->Encoding),因此当您 echo(); 时在屏幕上看到的内容并没有真正的意义。我们无法从中找出原始字节字符串是什么。

告诉我们您在 echo bin2hex($row['name']); 时看到的内容。这会将字符串中的每个字节转换为十六进制数字,因此 "\xd0\xa2\xd0\xb5\xd1" 将显示为 d0a2d0b5d1(如果您是这样的话)我得到了。

I'm guessing you've got the byte string "\xd0\xa2\xd0\xb5\xd1", then, which would be the UTF-8 encoded form of the characters Те (plus one following byte which is half a character).

If you merely echo() that on a page that you have declared as being UTF-8, it should display correctly on the browser:

 <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
 ...

 something: <?php echo htmlspecialchars($something); ?>

This naturally also means you will need to save the .php file itself using the UTF-8 encoding, if it has any non-ASCII characters in. (Many Windows text editors tend not to save as UTF-8 by default, sadly.)

If you must have a non-UTF-8 page, you would have to using iconv() to convert the string to whatever encoding you were using, presumably Windows code page 1251 for Russian ('cp1251'). But I would strongly recommend using UTF-8 for everything all the way through.

edit re comment:

if I'm doing mysql_set_charset("utf8", $db) before selecting row - I'm getting this "horror"

mysql_set_charset('utf8') is indeed the right thing to do. Check you are including the meta as above, and that the browser is seeing it (check View->Encoding is UTF-8).

If you are getting Ð¢ÐµÑ even with UTF-8 correctly getting sent, then I'm afraid the current contents of your database are messed up. Perhaps data had been inserted previously without the correct mysql_set_charset call, or maybe you did an SQL import that used the wrong charset.

If this is the case, you're likely going to have to go through each row of the database and ‘fix’ it by using iconv() to convert UTF-8 to ISO-8859-1. This should undo the double-UTF-8-encoding.

[edit:2]

iconv("UTF-8", "ISO-8859-1", $row['name']) saying Notice: iconv(): Detected an illegal character in input string.

OK, so the input isn't a valid UTF-8 sequence. That could either be because you're not getting UTF-8 out of the database after all, or because a UTF-8 sequence has become truncated. For example your string "\xd0\xa2\xd0\xb5\xd1" (which, read as ISO-8859-1, looks like "ТеÑ"), is not valid, as the final "Ñ" is only half of a two-byte UTF-8 sequence. As UTF-8 in a browser it would render as Те�.

If that's what you have in your database you'll need to fix the data in there before you can proceed.

it's ok if I echo $row['name'] without doing mysql_set_charset("utf8", $db)

You haven't confirmed that you are correctly sending UTF-8 and that the browser knows this (by checking View->Encoding), so it's not really meaningful what you see on-screen when you echo(); we can't work out what the original byte string was from that.

Tell us what you see when you echo bin2hex($row['name']);. This will convert each byte in the string into hex digits, so "\xd0\xa2\xd0\xb5\xd1" would come out as d0a2d0b5d1, if that's what you've got.

北方的巷 2024-09-15 15:54:33

输出到指定UTF8编码的页面。浏览器将以可读的形式显示它。

output to page with UTF8 encoding specified. browser will show it in readable form.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文