PCRE/PHP 中匹配 Unicode 字母字符

发布于 2024-10-17 07:35:42 字数 365 浏览 3 评论 0原文

我正在尝试在 PHP 中编写一个相当宽松的名称验证器,我的第一次尝试包含以下模式:

// unicode letters, apostrophe, hyphen, space
$namePattern = "/^([\\p{L}'\\- ])+$/";

这最终传递给对 preg_match() 的调用。据我所知,这适用于普通的 ASCII 字母,但似乎会遇到像 Ă 或张这样的更复杂的字符。

难道是图案本身有问题吗?也许我期望 \p{L} 做的工作比我想象的更多?

或者它与传入输入的方式有关?我不确定它是否相关,但我确实确保在表单页面上指定了 UTF8 编码。

I'm trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern:

// unicode letters, apostrophe, hyphen, space
$namePattern = "/^([\\p{L}'\\- ])+$/";

This is eventually passed to a call to preg_match(). As far as I can tell, this works with your vanilla ASCII alphabet, but seems to trip up on spicier characters like Ă or 张.

Is there something wrong with the pattern itself? Perhaps I'm expecting \p{L} to do more work than I think it does?

Or does it have something to do with the way input is being passed in? I'm not sure if it's relevant, but I did make sure to specify a UTF8 encoding on the form page.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

年华零落成诗 2024-10-24 07:35:42

我认为问题比这简单得多:您忘记指定 u 修饰符。 Unicode 字符属性仅在 UTF-8 模式下可用

你的正则表达式应该是:

// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';

I think the problem is much simpler than that: You forgot to specify the u modifier. The Unicode character properties are only available in UTF-8 mode.

Your regex should be:

// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';
归途 2024-10-24 07:35:42

如果其他人看到这里但无法使其正常工作,请注意 /u 不会在不同 PHP 版本中使用 Unicode 脚本产生一致的结果。

请参阅示例:https://3v4l.org/4hB9e

相关:不同 PHP 版本中泰语字符的正则表达式结果不一致

Anyone else looking here and not getting this to work, please note that /u will not produce consistent result with Unicode scripts across different PHP versions.

See example: https://3v4l.org/4hB9e

Related: Incosistent regex result for Thai characters across different PHP version

另类 2024-10-24 07:35:42

如果你想用新模式替换Unicode旧模式,你应该这样写:

$text = preg_replace('/\bold pattern\b/u', 'new pattern', $text);

所以这里的关键是u修饰符

注意 :您的服务器php版本应至少为PHP 4.3.5

如此处php.net |模式修饰符

u (PCRE_UTF8)
此修饰符打开与 Perl 不兼容的 PCRE 附加功能。模式字符串被视为 UTF-8。这
修饰符在 Unix 上的 PHP 4.1.0 或更高版本以及 PHP 中可用
4.2.3 在 win32 上。从 PHP 4.3.5 开始检查模式的 UTF-8 有效性。

感谢 AgreeOrNot 在这里给了我这个密钥 preg_replace 匹配整个阿拉伯语中的单词

我尝试了它,它在本地主机中工作,但是当我在远程服务器中尝试它时,它不起作用,然后我发现 php.net 开始在 PHP 4.3 中使用 u 修饰符.5. ,我升级了 php 版本并且它可以工作

重要的是要知道这种方法对阿拉伯语用户非常有帮助(请参阅),因为 - 正如我所相信 - unicode 是阿拉伯语语言的最佳编码,如果您不使用,替换将不起作用u 修饰符,请参阅下一个示例,它应该适用于您

$text = preg_replace('/\bмидаб ك\b/u', 'NEW', $text);

If you want to replace Unicode old pattern with new pattern you should write:

$text = preg_replace('/\bold pattern\b/u', 'new pattern', $text);

So the key here is u modifier

Note : Your server php version shoud be at least PHP 4.3.5

as mentioned here php.net | Pattern Modifiers

u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

Thanks AgreeOrNot who give me that key here preg_replace match whole word in arabic

I tried it and it worked in localhost but when I try it in remote server it didn't work, then I found that php.net start use u modifier in PHP 4.3.5. , I upgrade php version and it works

Its important to know that this method is very helpful for Arabic users (عربي) because - as I believe - unicode is the best encode for arabic language, and replacement will not work if you don't use the u modifier, see next example it should work with you

$text = preg_replace('/\bمرحبا بك\b/u', 'NEW', $text);

完美的未来在梦里 2024-10-24 07:35:42

首先,如果您在编写这些内容时使用单撇号而不是双引号,您的生活会容易得多 - 您只需要一个反斜杠。其次,还应该包括组合标记\pM。如果您发现某个字符不匹配,请找出它的 Unicode 代码点,然后您可以使用 http://www .fileformat.info/info/unicode/ 找出它在哪里。我发现 http://hsivonen.iki.fi/php-utf8/ 是一个非常宝贵的工具使用 UTF-8 属性进行调试(在尝试查找之前不要忘记转换为十六进制:array_map('dechex', utf8ToUnicode($text)))。

例如,Ă 结果是 http://www.fileformat.info /info/unicode/char/0102/index.htm 并且在 Lu 中,所以 L 应该匹配它,它确实适合我。另一个字符是 http://www.fileformat.info/info/unicode /char/5f20/index.htm 也是 isLetter 并且确实适合我。你有编译过的Unicode字符表吗?

First of all, your life would be a lot easier if you'd use single apostrophes instead of double quotes when writing these -- you need only one backslash. Second, combining marks \pM should also be included. If you find a character not matched please find out its Unicode code point and then you can use http://www.fileformat.info/info/unicode/ to figure out where it is. I found http://hsivonen.iki.fi/php-utf8/ an invaluable tool when doing debugging with UTF-8 properties (don't forget to convert to hex before trying to look up: array_map('dechex', utf8ToUnicode($text))).

For example, Ă turns out to be http://www.fileformat.info/info/unicode/char/0102/index.htm and to be in Lu and so L should match it and it does match for me. The other character is http://www.fileformat.info/info/unicode/char/5f20/index.htm and is also isLetter and indeed matches for me. Do you have the Unicode character tables compiled in?

橪书 2024-10-24 07:35:42
<?php preg_match('/[a-zığüşöç]/u',$title)  ?>
<?php preg_match('/[a-zığüşöç]/u',$title)  ?>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文