检测连续数字的正则表达式 - 不适用于非英语输入
大家好,我有这段代码可以检查 5 个或更多连续数字:
if (preg_match("/\d{5}/", $input, $matches) > 0)
return true;
对于英语输入它工作得很好,但是当输入字符串包含阿拉伯/多字节字符时它会出错 - 即使其中没有数字,它有时也会返回 true输入文本。
有什么想法吗?
Hi All I have this code that checks for 5 or more consecutive numbers :
if (preg_match("/\d{5}/", $input, $matches) > 0)
return true;
It works fine for input that is English, but it's tripping up when the input string contains Arabic/multibyte characters - it returns true sometimes even if there aren't numbers in the input text.
Any ideas ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您似乎正在使用 PHP。
执行此操作:
注意表达式末尾的“u”修饰符。它告诉 preg_* 使用 unicode 模式进行匹配。
You appear to be using PHP.
Do this:
Note the 'u' modifier at the end of expression. It tells preg_* to use unicode mode for matching.
当你想要处理 UTF-8 时,你必须正确设置自己。
您可以在启用 PCRE UTF-8 标志的情况下重新编译 php。
或者,您可以将序列
(*UTC8)
添加到正则表达式的开头。例如:/(*UTF8)[[:alnum:]]/
,输入é
,输出TRUE
/[[: alnum:]]/
,输入é
,输出FALSE
。查看 http://www.pcre.org/pcre.txt,其中包含大量信息关于 PCRE 库中的 UTF-8 支持。
You have to set yourself up properly when you want to deal with UTF-8.
You can recompile php with the PCRE UTF-8 flag enabled.
Or, you can add the sequence
(*UTC8)
to the start of your regex. For example:/(*UTF8)[[:alnum:]]/
, inputé
, outputTRUE
/[[:alnum:]]/
, inputé
, outputFALSE
.Check out http://www.pcre.org/pcre.txt, which contains lots of information about UTF-8 support in the PCRE library.
即使在 UTF-8 模式下,预定义的字符类(如
\d
和[[:digit:]]
也仅匹配 ASCII 字符。要匹配潜在的非 ASCII 数字,您必须使用等效的 Unicode 属性\p{Nd}
:在 ideone.com 上查看它的实际情况
如果您需要匹配特定字符或范围,您可以使用
\x{HHHH}
转义序列和适当的代码点:...或者使用
\xHH
形式输入它们的 UTF-8 编码字节序列:请注意,我在最后一个示例中切换为双引号。
\p{}
和\x{}
形式被传递给正则表达式编译器进行处理,但这次我们想要 PHP编译器扩展转义序列。单引号字符串中不会发生这种情况。Even in UTF-8 mode, predefined character classes like
\d
and[[:digit:]]
only match ASCII characters. To match potentially non-ASCII digits you have to use the equivalent Unicode property,\p{Nd}
:See it in action on ideone.com
If you need to match specific characters or ranges, you can either use the
\x{HHHH}
escape sequence with the appropriate code points:...or use the
\xHH
form to input their UTF-8 encoded byte sequences:Notice that I switched to double-quotes for this last example. The
\p{}
and\x{}
forms were passed through to be processed by the regex compiler, but this time we want the PHP compiler to expand the escape sequences. That doesn't happen in single-quoted strings.