在 C 或 C++ 中使用 GNU 正则表达式函数

发布于 2024-08-23 01:18:49 字数 553 浏览 5 评论 0原文

任何人都可以给我完整的示例程序如何在 gcc C 或 C++ 中使用 GNU 正则表达式函数 (http://docs.freebsd.org/info/regex/regex.info.GNU_Regex_Functions.html),与 re_pattern_bufferre_compile_fastmap

例如,翻译这个Python小程序:

import re
unlucky = re.compile('1\d*?3')

nums = ("13", "31", "777", "10003")

for n in nums:
    if unlucky.search(n) is None:
        print "lucky"
    else:
        print "unlucky"

谢谢!

Can anyone give me complete example program how to work with GNU regex functions in gcc C or C++ (http://docs.freebsd.org/info/regex/regex.info.GNU_Regex_Functions.html), with
re_pattern_buffer, re_compile_fastmap?

For example, translate this small Python program:

import re
unlucky = re.compile('1\d*?3')

nums = ("13", "31", "777", "10003")

for n in nums:
    if unlucky.search(n) is None:
        print "lucky"
    else:
        print "unlucky"

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

睫毛溺水了 2024-08-30 01:18:49

好的,在深入研究代码之前,我应该提到您可能想要使用更高级别的库。你确实说的是C++,所以这会让你进入 Boost.Regex 等。即使你想留在C,也有更好的选择。我发现 POSIX 函数 更干净,更不用说更便携了。

// Tell GNU to define the non-standard APIs
#define _GNU_SOURCE 
// This is actually the same header used for the POSIX API.
// Except then you obviously don't need _GNU_SOURCE
#include <regex.h> 
#include <stdio.h>
#include <string.h>

int main()
{
    struct re_pattern_buffer pat_buff; // Put a re_pattern_buffer on the stack
    // The next 4 fields must be set.  

    // If non-zero, applies a translation function to characters before 
    // attempting match (http://www.delorie.com/gnu/docs/regex/regex_51.html)
    pat_buff.translate = 0; 
    // If non-zero, optimization technique.  Don't know details.
    // See http://www.delorie.com/gnu/docs/regex/regex_45.html
    pat_buff.fastmap = 0;
    // Next two must be set to 0 to request library allocate memory
    pat_buff.buffer = 0;
    pat_buff.allocated = 0;
    char pat_str[] = "1[^3]*3";
    // This is a global (!) used to set the regex type (note POSIX APIs don't use global for this)
    re_syntax_options = RE_SYNTAX_EGREP; 
    // Compile the pattern into our buffer
    re_compile_pattern(pat_str, sizeof(pat_str) - 1, &pat_buff); 
    char* nums[] = {"13", "31", "777", "10003"}; // Array of char-strings
    for(int i = 0; i < sizeof(nums) / sizeof(char*); i++)
    {
        int match_ret;
        // Returns number of characters matches (may be 0, but if so there's still a match)
        if((match_ret = re_match(&pat_buff, nums[i], strlen(nums[i]), 0, NULL)) >= 0) 
        {
            printf("unlucky\n");
        }
        else if(match_ret == -1) // No match
        {
            printf("lucky\n");
        }
        // Anything else (though docs say -2) is internal library error
        else 
    {
            perror("re_match");
        }
    }
    regfree(&pat_buff);
}

编辑:我添加了对必填字段和 regfree 的更多解释。我之前有过幸运/不幸的倒退,这解释了部分差异。另一部分是我认为此处可用的任何正则表达式语法都不支持惰性运算符(*?)。在这种情况下,有一个简单的修复方法,使用 "1[^3]*3"

Okay, before delving into the code, I should mention that you may want to use a higher-level library. You did say C++, so that opens you up to Boost.Regex and the like. Even if you want to stay with C, there are better options. I find the POSIX functions somewhat cleaner, not to mention more portable.

// Tell GNU to define the non-standard APIs
#define _GNU_SOURCE 
// This is actually the same header used for the POSIX API.
// Except then you obviously don't need _GNU_SOURCE
#include <regex.h> 
#include <stdio.h>
#include <string.h>

int main()
{
    struct re_pattern_buffer pat_buff; // Put a re_pattern_buffer on the stack
    // The next 4 fields must be set.  

    // If non-zero, applies a translation function to characters before 
    // attempting match (http://www.delorie.com/gnu/docs/regex/regex_51.html)
    pat_buff.translate = 0; 
    // If non-zero, optimization technique.  Don't know details.
    // See http://www.delorie.com/gnu/docs/regex/regex_45.html
    pat_buff.fastmap = 0;
    // Next two must be set to 0 to request library allocate memory
    pat_buff.buffer = 0;
    pat_buff.allocated = 0;
    char pat_str[] = "1[^3]*3";
    // This is a global (!) used to set the regex type (note POSIX APIs don't use global for this)
    re_syntax_options = RE_SYNTAX_EGREP; 
    // Compile the pattern into our buffer
    re_compile_pattern(pat_str, sizeof(pat_str) - 1, &pat_buff); 
    char* nums[] = {"13", "31", "777", "10003"}; // Array of char-strings
    for(int i = 0; i < sizeof(nums) / sizeof(char*); i++)
    {
        int match_ret;
        // Returns number of characters matches (may be 0, but if so there's still a match)
        if((match_ret = re_match(&pat_buff, nums[i], strlen(nums[i]), 0, NULL)) >= 0) 
        {
            printf("unlucky\n");
        }
        else if(match_ret == -1) // No match
        {
            printf("lucky\n");
        }
        // Anything else (though docs say -2) is internal library error
        else 
    {
            perror("re_match");
        }
    }
    regfree(&pat_buff);
}

EDIT: I added more explanation of the required fields, and the regfree. I had the lucky/unlucky backwards before, which explains part of the discrepancy. The other part is that I don't think any of the regex syntaxes available here support lazy operators (*?). In this case, there's a simple fix, using "1[^3]*3".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文