为什么 boost 正则表达式耗尽了堆栈空间?

发布于 2024-10-08 05:56:50 字数 1603 浏览 1 评论 0原文

#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

这会产生:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

我的平台编译

g++ regex.cc -lboost_regex

编辑

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

this produces:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

compiled with

g++ regex.cc -lboost_regex

EDIT

my platform:

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

稚然 2024-10-15 05:56:50

((\\w+)(::)?)+ 是所谓的“病态”正则表达式之一——它会花费指数时间,因为你有两个表达式,它们相互依赖另一个接着另一个。也就是说,它由于灾难性回溯而失败。

考虑一下我们是否遵循链接的示例,并将“更复杂的东西”减少为“x”。让我们用 \\w 来做到这一点:

  • ((x+)(::)?)+

我们还假设我们的输入永远不会有 ::< /代码>。有了这个实际上会使正则表达式变得更加复杂,所以如果我们抛弃复杂性,那么我们真的应该让事情变得更简单:

  • (x+)+

现在你有一个像这样的教科书嵌套量词问题详细内容在上面的链接中。

有几种方法可以解决此问题,但最简单的方法可能是使用 原子团修饰符 "(?>":

  • ((?>\\w+)(::)?)+

((\\w+)(::)?)+ is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.

Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with \\w:

  • ((x+)(::)?)+

Let's also assume that our input is never going to have ::. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:

  • (x+)+

Now you've got a textbook nested quantifier problem like that detailed in the link above.

There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "(?>":

  • ((?>\\w+)(::)?)+
分開簡單 2024-10-15 05:56:50

在本地测试它并且工作正常,我猜你的编译器正在做一些奇怪的事情。

什么版本的gcc?什么平台?哪个版本的增强?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int

Tested it locally and it worked fine, my guess is your compiler is doing something weird.

What version of gcc? what platform? which version of boost?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文