如何安全地使用用户输入中的正则表达式?

发布于 2024-08-20 01:21:04 字数 538 浏览 8 评论 0原文

我的(基于 Perl 的)应用程序需要让用户输入正则表达式,以在幕后匹配各种字符串。到目前为止,我的计划是获取字符串并将其包装在类似

$regex = eval { qr/$text/ };
if (my $error = $@) { 
   # mangle $error to extract user-facing message

($text 的内容中,已提前删除换行符,因为它实际上是多行文本字段中的多个正则表达式我分裂)。

这样做是否存在任何潜在的安全风险 - 一些奇怪的输入可能导致任意代码执行? (除了 CVE-2007-5116 等正则表达式引擎中的缓冲区溢出漏洞)。如果是这样,有什么方法可以减轻它们吗?

有更好的方法吗?任何有助于抽象将用户输入转换为正则表达式的操作的 Perl 模块(例如提取错误消息......或提供像 /i 这样的修饰符,我在这里并不严格需要,但会是好的)?我搜索了 CPAN,并没有发现太多有希望的内容,但请考虑我错过了一些东西的可能性。

My (Perl-based) application needs to let users input regular expressions, to match various strings behind the scenes. My plan so far has been to take the string and wrap it in something like

$regex = eval { qr/$text/ };
if (my $error = $@) { 
   # mangle $error to extract user-facing message

($text having been stripped of newlines ahead of time, since it's actually multiple regular expressions in a multi-line text-field that I split).

Are there any potential security risks with doing this - some weird input that could lead to arbitrary code execution? (Besides the buffer overflow vulnarabilities in the regular expression engines like CVE-2007-5116). If so, are there ways to mitigate them?

Is there a better way to do this? Any Perl modules which help abstract the operations of turning user input into regular expressions (such as extracting error messages ... or providing modifiers like /i, which I don't strictly need here, but would be nice)? I searched CPAN and didn't find much that was promising, but entertain the possibility that I missed something.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

留蓝 2024-08-27 01:21:04

使用不受信任的输入作为正则表达式会产生拒绝服务漏洞,如 perlsec 中所述

正则表达式 - Perl 的正则表达式引擎被称为 NFA(非确定性有限自动机),其中包括
意味着如果正则表达式可以通过多种方式匹配,那么它很容易消耗大量的时间和空间。
精心设计正则表达式会有所帮助,但通常情况下,人们确实无能为力(《掌握正则表达式》一书
表达式”是必读内容,请参阅 perlfaq2)。空间不足表现为 Perl 内存不足。

Using untrusted input as a regular expression creates denial-of-service vulnerability as described in perlsec:

Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things
means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways.
Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular
Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.

平安喜乐 2024-08-27 01:21:04

使用 (?{ code }) 构造,用户输入可用于执行任意代码。请参阅 perlre#code 中的示例,其中显示

local $cnt = $cnt + 1,

将其替换为表达式

system("rm -rf /home/fennec"); print "Ha ha.\n";

(实际上,不要那样做。)

With the (?{ code }) construct, user input could be used to execute arbitrary code. See the example in perlre#code and where it says

local $cnt = $cnt + 1,

replace it with the expression

system("rm -rf /home/fennec"); print "Ha ha.\n";

(Actually, don't do that.)

温折酒 2024-08-27 01:21:04

修道院对此有一些讨论。

TLDR:使用 re::engine::RE2 -strict => 1;

确保添加 -strict => 1 到你的 use 语句或 re::engine::RE2 将回退到 Perl 的 re.1 。

以下是 GitHub 上项目的所有者 Paul Wankadia (junyer) 的引文< /a>:

RE2 的设计和实现的明确目标是能够毫无风险地处理来自不受信任用户的正则表达式。其主要保证之一是匹配时间与输入字符串的长度呈线性关系。它还在编写时考虑到了生产问题:解析器、编译器和执行引擎通过在可配置的预算内工作来限制其内存使用——耗尽时优雅地失败——并且它们通过避免递归来避免堆栈溢出。

总结一下要点:

  • 默认情况下,任意代码执行是安全的,但要添加“no re 'eval';”防止 PERL5OPT 或其他什么?从把它设置在你身上。我不确定这样做是否会阻止一切。

  • 使用带有 BSD::Resource 的子进程(fork)(甚至在 Linux 上)来限制内存并在超时后杀死子进程。

There is some discussion about this over at The Monastery.

TLDR: use re::engine::RE2 -strict => 1;

Make sure to add -strict => 1 to your use statement or re::engine::RE2 will fall back to Perl's re.

The following is a citation from Paul Wankadia (junyer), owner of the project on GitHub:

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

To sum up the important points:

  • It's safe from arbitrary code execution by default, but add "no re 'eval';" to prevent PERL5OPT or ??anything else?? from setting it on you. I'm not sure if doing so prevents everything.

  • Use a sub-process(fork) with BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout.

埋情葬爱 2024-08-27 01:21:04

最好的办法,就是不要让用户拥有太多的权限。提供一个足以让用户做他们想做的事情的界面。 (就像 ATM 机只有各种选项的按钮,不需要键盘输入)。当然,如果您需要用户键入输入,则提供文本框,然后在后端使用 Perl 来处理请求(例如清理等)。让用户输入正则表达式的动机是搜索字符串模式,对吧?那么在这种情况下,最简单、最安全的方法就是告诉他们只输入字符串。然后在后端,您使用 Perl 的正则表达式来搜索它。还有其他令人信服的理由让用户自己输入正则表达式吗?

the best way, is not to let users have too much privilege. Provide an interface just enough for users to do what they want. (like an ATM machine with only buttons for various options, no need for keyboard input). Of course, if you need user to key in input, then provide text box and then at the back end, use Perl to process the request (eg sanitizing etc). The motive behind letting your users input a regex is to search for string patterns right?? Then in that case, the most simplest and secure way is to tell them to input just the string. Then at the back end, you use Perl's regex to search for it. Is there any other compelling reason to have user input regex themselves?

新一帅帅 2024-08-27 01:21:04

也许您可以使用不支持危险代码标签的不同正则表达式引擎。

我还没有尝试过,但有一个 PCRE珀尔。您还可以使用此有关创建的信息来限制或删除代码支持自定义正则表达式引擎

Perhaps you could use a different regex engine that does not have the dangerous code tag support.

I haven't tried it but there is a PCRE for perl. You may also be able to limit or remove code support using this info on creating custom regex engines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文