如何使用 PCRE 获取所有匹配组?

发布于 2024-08-04 21:21:18 字数 2494 浏览 1 评论 0原文

我对使用 C 缺乏经验,我需要使用 PCRE 来获取匹配项。
这是我的源代码示例:

int test2()
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[OVECCOUNT];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (
             regex,       /* the pattern */
             0,                    /* default options */
             &error,               /* for error message */
             &erroffset,           /* for error offset */
             0);                   /* use default character tables */

    if (!re) {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    rc = pcre_exec (
        re,                   /* the compiled pattern */
        0,                    /* no extra data - pattern was not studied */
        str,                  /* the string to match */
        strlen(str),          /* the length of the string */
        0,                    /* start at offset 0 in the subject */
        0,                    /* default options */
        ovector,              /* output vector for substring information */
        OVECCOUNT);           /* number of elements in the output vector */

    if (rc < 0) {
        switch (rc) {
            case PCRE_ERROR_NOMATCH:
                printf("String didn't match");
                break;

            default:
                printf("Error while matching: %d\n", rc);
                break;
        }
        free(re);
        return -1;
    }

    for (i = 0; i < rc; i++) {
        printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
    }
}

在这个演示中,输出仅为:

0: From:[电子邮件受保护]
1:正则表达式
2: example.com

我想输出所有匹配项;我怎样才能做到这一点?

I am inexperienced with using C, and I need to use PCRE to get matches.
Here is a sample of my source code:

int test2()
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[OVECCOUNT];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (
             regex,       /* the pattern */
             0,                    /* default options */
             &error,               /* for error message */
             &erroffset,           /* for error offset */
             0);                   /* use default character tables */

    if (!re) {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    rc = pcre_exec (
        re,                   /* the compiled pattern */
        0,                    /* no extra data - pattern was not studied */
        str,                  /* the string to match */
        strlen(str),          /* the length of the string */
        0,                    /* start at offset 0 in the subject */
        0,                    /* default options */
        ovector,              /* output vector for substring information */
        OVECCOUNT);           /* number of elements in the output vector */

    if (rc < 0) {
        switch (rc) {
            case PCRE_ERROR_NOMATCH:
                printf("String didn't match");
                break;

            default:
                printf("Error while matching: %d\n", rc);
                break;
        }
        free(re);
        return -1;
    }

    for (i = 0; i < rc; i++) {
        printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
    }
}

In this demo, the output is only:

0: From:[email protected]
1: regular.expressions
2: example.com

I want to output all of the matches; how can I do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝眼泪 2024-08-11 21:21:18

我使用一个类来包装 PCRE 以使其更容易,但在 pcre_exec 之后,ovector 包含在原始字符串中查找匹配项所需的子字符串索引。

所以它会是这样的:

#include <string>
#include <iostream>
#include "pcre.h"

int main (int argc, char *argv[])
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re)
    {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0)
    {
        for(int i = 0; i < rc; ++i)
        {
            printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
        }
        offset = ovector[1];
    }
    return 1;
}

I use a class to wrap PCRE to make this easier, but after the pcre_exec, the ovector contains the substring indexes you need to find the matches within the original string.

So it would be something like:

#include <string>
#include <iostream>
#include "pcre.h"

int main (int argc, char *argv[])
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re)
    {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0)
    {
        for(int i = 0; i < rc; ++i)
        {
            printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
        }
        offset = ovector[1];
    }
    return 1;
}
め可乐爱微笑 2024-08-11 21:21:18

注意: pcre_exec() 的最后一个参数必须是元素计数,而不是 sizeof() ! (http://www.pcre.org/readme.txt

note: last parameter of pcre_exec() must be element-count, not sizeof() ! ( http://www.pcre.org/readme.txt )

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文