检测连续数字的正则表达式 - 不适用于非英语输入

发布于 2024-10-10 22:54:56 字数 217 浏览 12 评论 0原文

大家好，我有这段代码可以检查 5 个或更多连续数字：

if (preg_match("/\d{5}/", $input, $matches) > 0)
return true;

对于英语输入它工作得很好，但是当输入字符串包含阿拉伯/多字节字符时它会出错 - 即使其中没有数字，它有时也会返回 true输入文本。

有什么想法吗？

原文

Hi All I have this code that checks for 5 or more consecutive numbers :

if (preg_match("/\d{5}/", $input, $matches) > 0)
return true;

It works fine for input that is English, but it's tripping up when the input string contains Arabic/multibyte characters - it returns true sometimes even if there aren't numbers in the input text.

Any ideas ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

围归者 2024-10-17 22:54:56

您似乎正在使用 PHP。

执行此操作：

if (preg_match("/\d{5}/u", $input, $matches) > 0)
return true;

注意表达式末尾的“u”修饰符。它告诉 preg_* 使用 unicode 模式进行匹配。

You appear to be using PHP.

Do this:

if (preg_match("/\d{5}/u", $input, $matches) > 0)
return true;

Note the 'u' modifier at the end of expression. It tells preg_* to use unicode mode for matching.

回复收藏 0 原文

我还不会笑 2024-10-17 22:54:56

当你想要处理 UTF-8 时，你必须正确设置自己。

您可以在启用 PCRE UTF-8 标志的情况下重新编译 php。

或者，您可以将序列 (*UTC8) 添加到正则表达式的开头。例如：

/(*UTF8)[[:alnum:]]/，输入é，输出TRUE

/[[: alnum:]]/，输入é，输出FALSE。

查看 http://www.pcre.org/pcre.txt，其中包含大量信息关于 PCRE 库中的 UTF-8 支持。

回复收藏 0 原文

过潦 2024-10-17 22:54:56

即使在 UTF-8 模式下，预定义的字符类（如 \d 和 [[:digit:]] 也仅匹配 ASCII 字符。要匹配潜在的非 ASCII 数字，您必须使用等效的 Unicode 属性 \p{Nd}：

$s = "12345\xD9\xA1\xD9\xA2\xD9\xA3\xD9\xA4\xD9\xA5";
preg_match_all('~\p{Nd}{5}~u', $s, $matches);

在 ideone.com 上查看它的实际情况

如果您需要匹配特定字符或范围，您可以使用 \x{HHHH} 转义序列和适当的代码点：

preg_match_all('~[\x{0661}-\x{0665}]{5}~u', $s, $matches);

...或者使用 \xHH 形式输入它们的 UTF-8 编码字节序列：

preg_match_all("~[\xD9\xA1-\xD9\xA5]{5}~u", $s, $matches);

请注意，我在最后一个示例中切换为双引号。 \p{} 和 \x{} 形式被传递给正则表达式编译器进行处理，但这次我们想要 PHP编译器扩展转义序列。单引号字符串中不会发生这种情况。

Even in UTF-8 mode, predefined character classes like \d and [[:digit:]] only match ASCII characters. To match potentially non-ASCII digits you have to use the equivalent Unicode property, \p{Nd}:

$s = "12345\xD9\xA1\xD9\xA2\xD9\xA3\xD9\xA4\xD9\xA5";
preg_match_all('~\p{Nd}{5}~u', $s, $matches);

See it in action on ideone.com

If you need to match specific characters or ranges, you can either use the \x{HHHH} escape sequence with the appropriate code points:

preg_match_all('~[\x{0661}-\x{0665}]{5}~u', $s, $matches);

...or use the \xHH form to input their UTF-8 encoded byte sequences:

preg_match_all("~[\xD9\xA1-\xD9\xA5]{5}~u", $s, $matches);

Notice that I switched to double-quotes for this last example. The \p{} and \x{} forms were passed through to be processed by the regex compiler, but this time we want the PHP compiler to expand the escape sequences. That doesn't happen in single-quoted strings.

回复收藏 0 原文

~没有更多了~