如何在 PHP 中验证 utf 序列?

发布于 2024-12-11 15:27:40 字数 215 浏览 0 评论 0原文

将我的网站转换为使用 utf-8 后,我现在面临着验证所有传入 utf 数据的前景,以确保其有效且一致。

似乎有各种正则表达式和 PHP API 来检测字符串是否为 utf,但我见过的似乎不完整(正则表达式验证 utf,但仍然允许无效的第三个字节等)。

我还担心检测(和防止)超长编码,这意味着可以编码为多字节 utf 序列的 ASCII 字符。

欢迎任何建议或链接!

After converting my site to use utf-8, I'm now faced with the prospect of validating all incoming utf data, to ensure its valid and coherent.

There seems to be various regexp's and PHP API to detect whether a string is utf, but the ones Ive seen seem incomplete (regexps which validate utf, but still allow invalid 3rd bytes etc).

I'm also concerned about detecting (and preventing) overlong encoding, meaning ASCII characters that can be encoded as multibyte utf sequences.

Any suggestions or links welcome!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不…忘初心 2024-12-18 15:27:40

mb_check_encoding() 就是为此目的而设计的:

mb_check_encoding($string, 'UTF-8');

mb_check_encoding() is designed for this purpose:

mb_check_encoding($string, 'UTF-8');
逆流 2024-12-18 15:27:40

您可以使用 iconv 做很多事情,它可以告诉您序列是否是有效的 UTF-8。

告诉它从 UTF-8 转换为相同的:

$str = "\xfe\x20"; // Invalid UTF-8
$conv = @iconv('UTF-8', 'UTF-8', $str);
if ($str != $conv) {
    print("Input was not a valid UTF-8 sequence.\n");
}

询问字符串的长度(以字节为单位):

$str = "\xfe\x20"; // Invalid UTF-8
if (@iconv_strlen($str, 'UTF-8') === false) {
    print("Input was not a valid UTF-8 sequence.\n");
}

You can do a lot of things with iconv that can tell you if the sequence is valid UTF-8.

Telling it to convert from UTF-8 to the same:

$str = "\xfe\x20"; // Invalid UTF-8
$conv = @iconv('UTF-8', 'UTF-8', $str);
if ($str != $conv) {
    print("Input was not a valid UTF-8 sequence.\n");
}

Asking for the length of the string in bytes:

$str = "\xfe\x20"; // Invalid UTF-8
if (@iconv_strlen($str, 'UTF-8') === false) {
    print("Input was not a valid UTF-8 sequence.\n");
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文