有什么方法可以在 perl RE 中将 .* 视为 .{0,1024} 吗?
我们允许一些用户提供的 RE 来过滤电子邮件。早期,当匹配任意大的电子邮件时,我们遇到了一些包含 .*
等 RE 的性能问题。我们发现一个简单的解决方案是在用户提供的 RE 上使用 s/\*/{0,1024}/
。然而,这不是一个完美的解决方案,因为它会打破以下模式:
/[*]/
并且我不想想出一些复杂的方法来解释用户提供的 RE 输入的每个可能的突变,我只想限制 perl 对*
和 +
字符的最大长度为 1024 个字符。
有什么办法可以做到这一点吗?
We allow some user-supplied REs for the purpose of filtering email. Early on we ran into some performance issues with REs that contained, for example, .*
, when matching against arbitrarily-large emails. We found a simple solution was to s/\*/{0,1024}/
on the user-supplied RE. However, this is not a perfect solution, as it will break with the following pattern:
/[*]/
And rather than coming up with some convoluted recipe to account for every possible mutation of user-supplied RE input, I'd like to just limit perl's interpretation of the *
and +
characters to have a maximum length of 1024 characters.
Is there any way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这并不能真正回答您的问题,但您应该注意用户提供的正则表达式的其他问题,例如请参阅 OWASP 的摘要。根据您的具体情况,编写或查找自定义的简单模式匹配库可能会更好?
This does not really answer your question, but you should be aware of other issues with user-supplied regular expressions, see for example this summary at OWASP. Depending on your exact situation, it might be better to write or find a custom simple pattern matching library?
更新
在量词之前添加了
(?,因为不应匹配转义的 *+。如果存在
\\*
(匹配\
0次或多次),替换仍然会失败。改进是这样的
See it here on Regexr
这意味着匹配
[*+]
但是仅当前面没有结束]
且在此之前没有[
时。并且方括号之前不允许有\
((? 部分)
。
(?! ... )
是负向前瞻(? 是负向后向
查看 perlretut 了解详细信息
更新 2 包括所有格量词
查看它 Regexr 上
似乎有效,但现在变得非常复杂!
Update
Added a
(?<!\\)
before the quantifiers, because escaped *+ should not be matched. Replacement will still fail if there is an\\*
(match\
0 or more times).An improvement would be this
See it here on Regexr
That means match
[*+]
but only if there is no closing]
ahead and no[
till then. And there is no\
(the(?<!\\)
part) allowed before the square brackets.(?! ... )
is a negative lookahead(?<! ... )
is a negative lookbehindSee perlretut for details
Update 2 include possessive quantifiers
See it here on Regexr
Seems to be working, but its getting real complicated now!
使用 Regexp::Parser 获取树并根据需要修改正则表达式,或提供Regexp::English 的 GUI 界面
Get a tree using Regexp::Parser and modify regex as you want, or provide GUI interface to Regexp::English
你的意思是除了修补源?
You mean except of patching the source?