为什么 boost 正则表达式耗尽了堆栈空间?
#include <boost/regex.hpp>
#include <string>
#include <iostream>
using namespace boost;
static const regex regexp(
"std::vector<"
"(std::map<"
"(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
">,?)+"
">");
std::string errorMsg =
"std::vector<"
"std::map<"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>"
">,"
"std::map<"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>"
">"
">";
int main()
{
smatch result;
if(regex_match(errorMsg, result, regexp))
{
for (unsigned i = 0; i < result.size(); ++i)
{
std::cout << result[i] << std::endl;
}
}
// std::cout << errorMsg << std::endl;
return 0;
}
这会产生:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>' what(): Ran out of stack space trying to match the regular expression.
我的平台编译
g++ regex.cc -lboost_regex
用编辑
:
g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
#include <boost/regex.hpp>
#include <string>
#include <iostream>
using namespace boost;
static const regex regexp(
"std::vector<"
"(std::map<"
"(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
">,?)+"
">");
std::string errorMsg =
"std::vector<"
"std::map<"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>"
">,"
"std::map<"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>,"
"std::pair<Test::Test, int>"
">"
">";
int main()
{
smatch result;
if(regex_match(errorMsg, result, regexp))
{
for (unsigned i = 0; i < result.size(); ++i)
{
std::cout << result[i] << std::endl;
}
}
// std::cout << errorMsg << std::endl;
return 0;
}
this produces:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>' what(): Ran out of stack space trying to match the regular expression.
compiled with
g++ regex.cc -lboost_regex
EDIT
my platform:
g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
((\\w+)(::)?)+
是所谓的“病态”正则表达式之一——它会花费指数时间,因为你有两个表达式,它们相互依赖另一个接着另一个。也就是说,它由于灾难性回溯而失败。考虑一下我们是否遵循链接的示例,并将“更复杂的东西”减少为“x”。让我们用
\\w
来做到这一点:((x+)(::)?)+
我们还假设我们的输入永远不会有
::< /代码>。有了这个实际上会使正则表达式变得更加复杂,所以如果我们抛弃复杂性,那么我们真的应该让事情变得更简单:
(x+)+
现在你有一个像这样的教科书嵌套量词问题详细内容在上面的链接中。
有几种方法可以解决此问题,但最简单的方法可能是使用 原子团修饰符 "
(?>
":((?>\\w+)(::)?)+
((\\w+)(::)?)+
is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with
\\w
:((x+)(::)?)+
Let's also assume that our input is never going to have
::
. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:(x+)+
Now you've got a textbook nested quantifier problem like that detailed in the link above.
There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "
(?>
":((?>\\w+)(::)?)+
在本地测试它并且工作正常,我猜你的编译器正在做一些奇怪的事情。
什么版本的gcc?什么平台?哪个版本的增强?
Tested it locally and it worked fine, my guess is your compiler is doing something weird.
What version of gcc? what platform? which version of boost?