PCRE：不同服务器上 \w 的不同行为

发布于 2024-10-05 15:40:56 字数 418 浏览 4 评论 0原文

我在自己的应用程序中使用 Kohana 的路由系统，并且在为 url 标签定义 PCRE 模式时，我的本地主机的行为与生产服务器不同。

我有这条路线：

Route::set( 'list', 'list(/tagged/<tags>)',
            array('tags'=>'[\w\d\-\+]+') );

这曾经工作得很好，直到有一天有人使用了包含非“标准”字符（ñ）的标签。在我的本地主机中没有问题，但在生产服务器中系统无法找到路由。

在生产代码中，我需要修改模式并显式地将“ñ”添加到允许的字符中！

'\pL[\w\d\-\+ñ]+'

问题是，为什么？好吧，现在我添加了“ñ”，它可以工作，但它迟早会再次失败！

原文

I'm using the routing system of Kohana for my own application, and when defining the pcre pattern for a tag of the url, my localhost behaves differently from the production server.

I have this route:

Route::set( 'list', 'list(/tagged/<tags>)',
            array('tags'=>'[\w\d\-\+]+') );

This used to work fine, until the day someone used a tag that contained not "standard" characters (ñ). In my localhost there is no problem, but In production server the system is not able to found the route.

In production code I need to modify the pattern and explicitly add the 'ñ' to the allowed characters!

'\pL[\w\d\-\+ñ]+'

The question is, why? Ok, it works now that I added the 'ñ', but it is going to fail again sooner or later!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半夏半凉 2024-10-12 15:40:56

查看可以在此处使用的不同 Unicode 字符类： http://www.regular -expressions.info/unicode.html#prop 话虽如此，您将能够使用类似这样的内容：

Route::set('list', 'list(/tagged/<tags>)', array('tags'=>'[\p{L}\p{N}\-\+]+'));

\p{L} 来自任何语言的任何类型的字母。
\p{N} 任何脚本中的任何类型的数字字符。

我已经在 ideone.com 上对此进行了测试。查看示例。

Have a look at the different Unicode character classes you can use here: http://www.regular-expressions.info/unicode.html#prop With that said, you will be able to use something like this:

Route::set('list', 'list(/tagged/<tags>)', array('tags'=>'[\p{L}\p{N}\-\+]+'));

\p{L} any kind of letter from any language.
\p{N} any kind of numeric character in any script.

I've tested this out on ideone.com. View example.

回复收藏 0 原文

公布 2024-10-12 15:40:56

由于 \w 的含义与区域设置相关，因此您的生产服务器可能具有干净的 C 区域设置，而您的开发系统包含扩展字符代码。

IIRC 使用 /u unicode 修饰符允许 \w 匹配所有“字母”字符。如果 Kohana 不允许指定修饰符，请将其添加到 (?u)[...] 中。或者，也许在您的情况下，您只需要在方括号内重复 \p{L} ：

'\pL[\w\d\-\+\p{L}]+'

Since the meaning of \w is locale-dependent, your production server probably has a clean C locale, whereas your development system includes extended character codes.

IIRC using the /u unicode modifier allows \w to match all "letter" characters. If Kohana doesn't allow specifying modifiers, add it inline with (?u)[...]. Or maybe in your case you only need to repeat \p{L} within the square brackets: