如何在 PHP 中验证 utf 序列?
将我的网站转换为使用 utf-8 后,我现在面临着验证所有传入 utf 数据的前景,以确保其有效且一致。
似乎有各种正则表达式和 PHP API 来检测字符串是否为 utf,但我见过的似乎不完整(正则表达式验证 utf,但仍然允许无效的第三个字节等)。
我还担心检测(和防止)超长编码,这意味着可以编码为多字节 utf 序列的 ASCII 字符。
欢迎任何建议或链接!
After converting my site to use utf-8, I'm now faced with the prospect of validating all incoming utf data, to ensure its valid and coherent.
There seems to be various regexp's and PHP API to detect whether a string is utf, but the ones Ive seen seem incomplete (regexps which validate utf, but still allow invalid 3rd bytes etc).
I'm also concerned about detecting (and preventing) overlong encoding, meaning ASCII characters that can be encoded as multibyte utf sequences.
Any suggestions or links welcome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
mb_check_encoding() 就是为此目的而设计的:
mb_check_encoding() is designed for this purpose:
您可以使用 iconv 做很多事情,它可以告诉您序列是否是有效的 UTF-8。
告诉它从 UTF-8 转换为相同的:
询问字符串的长度(以字节为单位):
You can do a lot of things with
iconv
that can tell you if the sequence is valid UTF-8.Telling it to convert from UTF-8 to the same:
Asking for the length of the string in bytes: