使用 boost::regex 获取子 match_results

发布于 2024-11-03 20:17:16 字数 494 浏览 7 评论 0原文

嘿,假设我有这个正则表达式: (test[0-9])+

并且我将其与: test1test2test3test0

const bool ret = boost::regex_search(input, what, r);

for (size_t i = 0; i < what.size(); ++i)
    cout << i << ':' << string(what[i]) << "\n";

现在,what[1]< /code> 将是 test0 (最后一次出现)。假设我还需要获取 test1、2 和 3:我该怎么办?

注意:真正的正则表达式极其复杂,并且必须保持整体匹配,因此将示例正则表达式更改为 (test[0-9]) 将不起作用。

Hey, let's say I have this regex: (test[0-9])+

And that I match it against: test1test2test3test0

const bool ret = boost::regex_search(input, what, r);

for (size_t i = 0; i < what.size(); ++i)
    cout << i << ':' << string(what[i]) << "\n";

Now, what[1] will be test0 (the last occurrence). Let's say that I need to get test1, 2 and 3 as well: what should I do?

Note: the real regex is extremely more complex and has to remain one overall match, so changing the example regex to (test[0-9]) won't work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

近箐 2024-11-10 20:17:16

我认为 Dot Net 有能力制作单个捕获组集合,以便 (grp)+ 将在 group1 上创建一个集合对象。 boost 引擎的 regex_search() 将与任何普通的匹配函数一样。您处于 while() 循环中,匹配上一次匹配结束的模式。您使用的表单不​​使用出价迭代器,因此该函数不会从上一场比赛结束的地方开始下一场比赛。

您可以使用迭代器形式:
编辑 - 您还可以使用令牌迭代器,定义要迭代的组。添加到下面的代码中)。

#include <boost/regex.hpp> 
#include <string> 
#include <iostream> 

using namespace std;
using namespace boost;

int main() 
{ 
    string input = "test1 ,, test2,, test3,, test0,,";
    boost::regex r("(test[0-9])(?:$|[ ,]+)");
    boost::smatch what;

    std::string::const_iterator start = input.begin();
    std::string::const_iterator end   = input.end();

    while (boost::regex_search(start, end, what, r))
    {
        string stest(what[1].first, what[1].second);
        cout << stest << endl;
        // Update the beginning of the range to the character
        // following the whole match
        start = what[0].second;
    }

    // Alternate method using token iterator 
    const int subs[] = {1};  // we just want to see group 1
    boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
    boost::sregex_token_iterator j;
    while(i != j)
    {
       cout << *i++ << endl;
    }

    return 0;
}

输出:

test1
测试2
测试3
测试0

I think Dot Net has the ability to make single capture group Collections so that (grp)+ will create a collection object on group1. The boost engine's regex_search() is going to be just like any ordinary match function. You sit in a while() loop matching the pattern where the last match left off. The form you used does not use a bid-itterator, so the function won't start the next match where the last match left off.

You can use the itterator form:
(Edit - you can also use the token iterator, defining what groups to iterate over. Added in the code below).

#include <boost/regex.hpp> 
#include <string> 
#include <iostream> 

using namespace std;
using namespace boost;

int main() 
{ 
    string input = "test1 ,, test2,, test3,, test0,,";
    boost::regex r("(test[0-9])(?:$|[ ,]+)");
    boost::smatch what;

    std::string::const_iterator start = input.begin();
    std::string::const_iterator end   = input.end();

    while (boost::regex_search(start, end, what, r))
    {
        string stest(what[1].first, what[1].second);
        cout << stest << endl;
        // Update the beginning of the range to the character
        // following the whole match
        start = what[0].second;
    }

    // Alternate method using token iterator 
    const int subs[] = {1};  // we just want to see group 1
    boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
    boost::sregex_token_iterator j;
    while(i != j)
    {
       cout << *i++ << endl;
    }

    return 0;
}

Output:

test1
test2
test3
test0

拒绝两难 2024-11-10 20:17:16

Boost.Regex 为该功能提供了实验性支持(称为重复捕获);但是,由于它对性能造成巨大影响,因此默认情况下禁用此功能。

要启用重复捕获,您需要重建Boost.Regex并在所有翻译单元中定义宏BOOST_REGEX_MATCH_EXTRA;最好的方法是在 boost/regex/user.hpp 中取消注释此定义(请参阅 参考,它位于页面的最底部)。

使用此定义进行编译后,您可以通过调用/使用 regex_searchregex_matchregex_iterator 以及 match_extra 来使用此功能旗帜。

检查对 Boost.Regex 的引用更多信息。

Boost.Regex offers experimental support for exactly this feature (called repeated captures); however, since it's huge performance hit, this feature is disabled by default.

To enable repeated captures, you need to rebuild Boost.Regex and define macro BOOST_REGEX_MATCH_EXTRA in all translation units; the best way to do this is to uncomment this define in boost/regex/user.hpp (see the reference, it's at the very bottom of the page).

Once compiled with this define, you can use this feature by calling/using regex_search, regex_match and regex_iterator with match_extra flag.

Check reference to Boost.Regex for more info.

暗地喜欢 2024-11-10 20:17:16

在我看来,您需要创建一个 regex_iterator,使用 (test[0-9]) 正则表达式作为输入。然后,您可以使用生成的 regex_iterator 来枚举原始目标的匹配子字符串。

如果您仍然需要“整体匹配”,那么这项工作可能必须与查找匹配子字符串的任务分离。您能澄清一下您的要求的那部分吗?

Seems to me like you need to create a regex_iterator, using the (test[0-9]) regex as input. Then you can use the resulting regex_iterator to enumerate the matching substrings of your original target.

If you still need "one overall match" then perhaps that work has to be decoupled from the task of finding matching substrings. Can you clarify that part of your requirement?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文