PHP - 检测用户提供的字符的字符集
是否可以检测用户字符串的字符集?
如果没有,下一个问题怎么样?
是否有可靠的内置 PHP 函数可以准确判断用户提供的字符串(通过 get/post/cookie 等提供)是否在UTF-8 与否?换句话说,我可以做类似
is_utf8($_GET['first_name'])
的事情吗?这个函数是否可以产生一个 TRUE,而实际上 first_name 不是 UTF-8 格式?
Is it possible to detect the user's string's char set?
If not, how about the next question..
Are there reliable built-in PHP functions that can accurately tell if the user supplied string ( be it supplied thru get/post/cookie etc), are in a UTF-8 or not? In other words, can I do something like
is_utf8($_GET['first_name'])
Is there anyway this function could produce a TRUE where in reality the first_name was not in UTF-8?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关于1:
您可以尝试
mb_detect_encoding
,但这几乎是一次尝试黑暗的。 “编码”字符串只是一堆字节。这样的字节序列通常在任意数量的不同编码中同样有效。因此,根据定义,不可能可靠检测到未知编码,您只能猜测。因此,存在诸如 HTTP 标头之类的元信息,它们应该传达所传输内容的编码。检查这些是否可用。关于2:
mb_check_encoding($var, 'UTF-8')
会告诉你是否该字符串是有效的 UTF-8 字符串。据我所知,在 PHP 的最新版本中,它按照其表面上的说明进行操作。这仍然并不意味着该字符串一定是真正的 UTF-8 字符串,它只是意味着字节序列的顺序在 UTF-8 中有效。Regarding 1:
You can give
mb_detect_encoding
a try, but it's pretty much a shot in the dark. An "encoded" string is just a bunch of bytes. Such byte sequences are often equally valid in any number of different encodings. It's therefore by definition not possible to detect an unknown encoding reliably, you can only guess. For this reason there exist meta information such as HTTP headers which should communicate the encoding of the transferred content. Check those if available.Regarding 2:
mb_check_encoding($var, 'UTF-8')
will tell you whether the string is a valid UTF-8 string. As far as I've seen, in recent versions of PHP it does what it says on the tin. That still doesn't mean the string is necessarily really a UTF-8 string, it just means the byte sequence is in an order that is valid in UTF-8.