有什么方法可以在 perl RE 中将 .* 视为 .{0,1024} 吗？

发布于 2024-12-21 06:01:47 字数 329 浏览 0 评论 0原文

我们允许一些用户提供的 RE 来过滤电子邮件。早期，当匹配任意大的电子邮件时，我们遇到了一些包含 .* 等 RE 的性能问题。我们发现一个简单的解决方案是在用户提供的 RE 上使用 s/\*/{0,1024}/。然而，这不是一个完美的解决方案，因为它会打破以下模式：

/[*]/

并且我不想想出一些复杂的方法来解释用户提供的 RE 输入的每个可能的突变，我只想限制 perl 对* 和 + 字符的最大长度为 1024 个字符。

有什么办法可以做到这一点吗？

原文

We allow some user-supplied REs for the purpose of filtering email. Early on we ran into some performance issues with REs that contained, for example, .*, when matching against arbitrarily-large emails. We found a simple solution was to s/\*/{0,1024}/ on the user-supplied RE. However, this is not a perfect solution, as it will break with the following pattern:

/[*]/

And rather than coming up with some convoluted recipe to account for every possible mutation of user-supplied RE input, I'd like to just limit perl's interpretation of the * and + characters to have a maximum length of 1024 characters.

Is there any way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

音盲 2024-12-28 06:01:47

这并不能真正回答您的问题，但您应该注意用户提供的正则表达式的其他问题，例如请参阅 OWASP 的摘要。根据您的具体情况，编写或查找自定义的简单模式匹配库可能会更好？

回复收藏 0 原文

小瓶盖 2024-12-28 06:01:47

更新

在量词之前添加了 (?，因为不应匹配转义的 *+。如果存在\\*（匹配\ 0次或多次），替换仍然会失败。

改进是这样的

s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/
s/(?<!\\)\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/

See it here on Regexr

这意味着匹配 [*+] 但是仅当前面没有结束 ] 且在此之前没有 [ 时。并且方括号之前不允许有 \ （(? 部分）。

(?! ... ) 是负向前瞻

(? 是负向后向

查看 perlretut 了解详细信息

更新 2 包括所有格量词

s/(?<!(?<!\\)[\\+*?])\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/   # for +
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/    # for *

查看它 Regexr 上

似乎有效，但现在变得非常复杂！

Update

Added a (?<!\\) before the quantifiers, because escaped *+ should not be matched. Replacement will still fail if there is an \\* (match \ 0 or more times).

An improvement would be this

s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/
s/(?<!\\)\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/

See it here on Regexr

That means match [*+] but only if there is no closing ] ahead and no [ till then. And there is no \ (the (?<!\\) part) allowed before the square brackets.

(?! ... ) is a negative lookahead

(?<! ... ) is a negative lookbehind

See perlretut for details

Update 2 include possessive quantifiers

s/(?<!(?<!\\)[\\+*?])\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/   # for +
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/    # for *

See it here on Regexr

Seems to be working, but its getting real complicated now!

回复收藏 0 原文

风和你 2024-12-28 06:01:47

使用 Regexp::Parser 获取树并根据需要修改正则表达式，或提供Regexp::English 的 GUI 界面

回复收藏 0 原文

迷爱 2024-12-28 06:01:47

你的意思是除了修补源？

您可以将输入文本分成较短的块并仅匹配这些块。但话又说回来，你不会在“换行”中断处进行匹配。
您可以破坏正则表达式，仅搜索它的第一个字符，加载接下来的 1024 个字符的文本，然后匹配整个正则表达式（显然，这不适用于以开头的正则表达式。）
找到不是 .*+()\ 的正则表达式，发现加载前后 1024 个字符，然后匹配该字符串上的整个正则表达式。（复杂并修剪奇怪的不可预见的正则表达式中的错误）

回复收藏 0 原文

~没有更多了~

关于作者

忆梦

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

有什么方法可以在 perl RE 中将 .* 视为 .{0,1024} 吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

我一直都在从未离去

fangs

朱染

zhangcx

Willy

taohaoge

友情链接

有什么方法可以在 perl RE 中将 .* 视为 .{0,1024} 吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

我一直都在从未离去

fangs

朱染

zhangcx

Willy

taohaoge

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。