Unicode 未知“�” PHP 中的字符检测
PHP 有没有办法检测以下字符�
?
我目前正在使用几种不同的算法修复许多 UTF-8 编码问题,并且需要能够检测 �
是否存在于字符串中。如何使用 strpos
做到这一点?
简单地将角色粘贴到我的代码库中似乎不起作用。
if (strpos($names['decode'], '?') !== false || strpos($names['decode'], '�') !== false)
Is there any way in PHP of detecting the following character �
?
I'm currently fixing a number of UTF-8 encoding issues with a few different algorithms and need to be able to detect if �
is present in a string. How do I do so with strpos
?
Simply pasting the character into my codebase does not seem to work.
if (strpos($names['decode'], '?') !== false || strpos($names['decode'], '�') !== false)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用
//IGNORE
参数使用iconv()
将 UTF-8 字符串转换为 UTF-8 会产生删除无效 UTF-8 字符的结果。因此,您可以通过比较 iconv 操作前后的字符串长度来检测损坏的字符。如果它们不同,则它们包含损坏的字符。
测试用例(确保将文件保存为 UTF-8):
理论上,您可以删除
//IGNORE
并简单地测试失败(空)的iconv
操作,但 iconv 失败可能还有其他原因,而不仅仅是无效字符......我不知道。我会使用比较方法。Converting a UTF-8 string into UTF-8 using
iconv()
using the//IGNORE
parameter produces a result where invalid UTF-8 characters are dropped.Therefore, you can detect a broken character by comparing the length of the string before and after the iconv operation. If they differ, they contained a broken character.
Test case (make sure you save the file as UTF-8):
in theory, you could drop
//IGNORE
and simply test for a failed (empty)iconv
operation, but there might be other reasons for a iconv to fail than just invalid characters... I don't know. I would use the comparison method.当我期望的时候,我会执行以下操作来检测和纠正未以 UTF-8 编码的字符串的编码:
Here is what I do to detect and correct the encoding of strings not encoded in UTF-8 when that is what I am expecting:
据我所知,那个问号符号不是单个字符。标准字体集中有许多不同的字符代码未映射到符号,这是使用的默认符号。要在 PHP 中进行检测,您首先需要知道您正在使用的是什么字体。然后您需要查看字体实现并查看哪些范围的代码映射到“?”符号,然后查看给定字符是否在这些范围之一内。
As far as I know, that question mark symbol is not a single character. There are many different character codes in the standard font sets that are not mapped to a symbol, and that is the default symbol that is used. To do detection in PHP, you would first need to know what font it is that you're using. Then you need to look at the font implementation and see what ranges of codes map to the "?" symbol, and then see if the given character is in one of those ranges.
我使用 CUSTOM 方法(使用
str_replace
)来清理未定义的字符:I use the CUSTOM method (using
str_replace
) to sanitize undefined characters: