PHP 和 Â 的字符编码问题特点
我遇到一个问题,PHP (5.2) 无法在字符串中找到字符“”,尽管它显然在那里。
我意识到根本问题与字符编码有关,但不幸的是我无法控制源内容。我收到的格式为 UTF-8,这些字符已在字符串中。
我只想将其从字符串中删除。 strpos()、str_replace()、preg_replace()、trim()等无法正确识别。
我的字符串是这样的:
"Â Â Â A lot of couples throughout the World "
如果我这样做:
$string = str_replace('Â','',$string);
我得到这个:
"� � � A lot of couples throughout the World"
我什至在 str_replace 之前尝试过 utf8_encode() 和 utf8_decode() ,但没有运气。
解决办法是什么?我已经把我能找到的所有东西都扔进去了......
I'm having a problem where PHP (5.2) cannot find the character 'Â' in a string, though it is clearly there.
I realize the underlying problem has to do with character encoding, but unfortunately I have no control over the source content. I receive it as UTF-8, with those characters already in the string.
I would simply like to remove it from the string. strpos(), str_replace(), preg_replace(), trim(), etc. Cannot correctly identify it.
My string is this:
"Â Â Â A lot of couples throughout the World "
If I do this:
$string = str_replace('Â','',$string);
I get this:
"� � � A lot of couples throughout the World"
I even tried utf8_encode() and utf8_decode() before the str_replace, with no luck.
What's the solution? I've been throwing everything I can find at it...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个“”是如何编码的?如果您的脚本文件保存为 iso-8859-1,则字符串
'Â'
会编码为单字节序列 xC2,而 (/one) utf-8 表示形式为 xC3 x82。 php 的 str_replace() 在字节级别上工作,即它只“识别”单字节字符。请参阅 http://docs.php.net/intro.mbstring
How is this 'Â' encoded? If your script file is saved as iso-8859-1 the string
'Â'
is encoded as the one byte sequence xC2 while the (/one) utf-8 representation is xC3 x82. php's str_replace() works on the byte level, i.e. it only "knows" single-byte characters.see http://docs.php.net/intro.mbstring
我用这个:
I use this:
来自 PHP 手册注释页面:
http://www.php .net/manual/en/function.preg-replace.php#96847
来自 StackOverflow:
不使用 iconv 删除重音符号
From the PHP Manual Comment Page:
http://www.php.net/manual/en/function.preg-replace.php#96847
And from StackOverflow:
Remove accents without using iconv