PCRE/PHP 中匹配 Unicode 字母字符

发布于 2024-10-17 07:35:42 字数 365 浏览 3 评论 0原文

我正在尝试在 PHP 中编写一个相当宽松的名称验证器，我的第一次尝试包含以下模式：

// unicode letters, apostrophe, hyphen, space
$namePattern = "/^([\\p{L}'\\- ])+$/";

这最终传递给对 preg_match() 的调用。据我所知，这适用于普通的 ASCII 字母，但似乎会遇到像 Ă 或张这样的更复杂的字符。

难道是图案本身有问题吗？也许我期望 \p{L} 做的工作比我想象的更多？

或者它与传入输入的方式有关？我不确定它是否相关，但我确实确保在表单页面上指定了 UTF8 编码。

原文

I'm trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern:

// unicode letters, apostrophe, hyphen, space
$namePattern = "/^([\\p{L}'\\- ])+$/";

This is eventually passed to a call to preg_match(). As far as I can tell, this works with your vanilla ASCII alphabet, but seems to trip up on spicier characters like Ă or 张.

Is there something wrong with the pattern itself? Perhaps I'm expecting \p{L} to do more work than I think it does?

Or does it have something to do with the way input is being passed in? I'm not sure if it's relevant, but I did make sure to specify a UTF8 encoding on the form page.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

年华零落成诗 2024-10-24 07:35:42

我认为问题比这简单得多：您忘记指定 u 修饰符。 Unicode 字符属性仅在 UTF-8 模式下可用。

你的正则表达式应该是：

// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';

I think the problem is much simpler than that: You forgot to specify the u modifier. The Unicode character properties are only available in UTF-8 mode.

Your regex should be:

// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';

回复收藏 0 原文

归途 2024-10-24 07:35:42

如果其他人看到这里但无法使其正常工作，请注意 /u 不会在不同 PHP 版本中使用 Unicode 脚本产生一致的结果。

请参阅示例：https://3v4l.org/4hB9e

相关：不同 PHP 版本中泰语字符的正则表达式结果不一致

回复收藏 0 原文

另类 2024-10-24 07:35:42

如果你想用新模式替换Unicode旧模式，你应该这样写：

$text = preg_replace('/\bold pattern\b/u', 'new pattern', $text);

所以这里的关键是u修饰符

注意：您的服务器php版本应至少为PHP 4.3.5，

如此处php.net |模式修饰符

u (PCRE_UTF8)
此修饰符打开与 Perl 不兼容的 PCRE 附加功能。模式字符串被视为 UTF-8。这
修饰符在 Unix 上的 PHP 4.1.0 或更高版本以及 PHP 中可用
4.2.3 在 win32 上。从 PHP 4.3.5 开始检查模式的 UTF-8 有效性。

感谢 AgreeOrNot 在这里给了我这个密钥 preg_replace 匹配整个阿拉伯语中的单词

我尝试了它，它在本地主机中工作，但是当我在远程服务器中尝试它时，它不起作用，然后我发现 php.net 开始在 PHP 4.3 中使用 u 修饰符.5. ，我升级了 php 版本并且它可以工作

重要的是要知道这种方法对阿拉伯语用户非常有帮助（请参阅），因为 - 正如我所相信 - unicode 是阿拉伯语语言的最佳编码，如果您不使用，替换将不起作用u 修饰符，请参阅下一个示例，它应该适用于您

$text = preg_replace('/\bмидаб ك\b/u', 'NEW', $text);

If you want to replace Unicode old pattern with new pattern you should write:

$text = preg_replace('/\bold pattern\b/u', 'new pattern', $text);

So the key here is u modifier

Note : Your server php version shoud be at least PHP 4.3.5

as mentioned here php.net | Pattern Modifiers

u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

Thanks AgreeOrNot who give me that key here preg_replace match whole word in arabic

I tried it and it worked in localhost but when I try it in remote server it didn't work, then I found that php.net start use u modifier in PHP 4.3.5. , I upgrade php version and it works

Its important to know that this method is very helpful for Arabic users (عربي) because - as I believe - unicode is the best encode for arabic language, and replacement will not work if you don't use the u modifier, see next example it should work with you

$text = preg_replace('/\bمرحبا بك\b/u', 'NEW', $text);

回复收藏 0 原文

完美的未来在梦里 2024-10-24 07:35:42

首先，如果您在编写这些内容时使用单撇号而不是双引号，您的生活会容易得多 - 您只需要一个反斜杠。其次，还应该包括组合标记\pM。如果您发现某个字符不匹配，请找出它的 Unicode 代码点，然后您可以使用 http://www .fileformat.info/info/unicode/ 找出它在哪里。我发现 http://hsivonen.iki.fi/php-utf8/ 是一个非常宝贵的工具使用 UTF-8 属性进行调试（在尝试查找之前不要忘记转换为十六进制：array_map('dechex', utf8ToUnicode($text))）。

例如，Ă 结果是 http://www.fileformat.info /info/unicode/char/0102/index.htm 并且在 Lu 中，所以 L 应该匹配它，它确实适合我。另一个字符是 http://www.fileformat.info/info/unicode /char/5f20/index.htm 也是 isLetter 并且确实适合我。你有编译过的Unicode字符表吗？

回复收藏 0 原文

橪书 2024-10-24 07:35:42

<?php preg_match('/[a-zığüşöç]/u',$title)  ?>

<?php preg_match('/[a-zığüşöç]/u',$title)  ?>

回复收藏 0 原文

~没有更多了~

关于作者

酒儿

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

PCRE/PHP 中匹配 Unicode 字母字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

PCRE/PHP 中匹配 Unicode 字母字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。