当前位置：文江博客话题详情

与通配符匹配的文件名

发布于 2024-09-10 23:21:59 字数 156 浏览 3 评论 0原文

我需要实现类似我自己的文件系统的东西。一个操作是 FindFirstFile。我需要检查调用者是否传递了类似 .、sample*.cpp 等内容。我的“文件系统”实现提供“文件名”列表作为 char* 数组。

Windows有没有函数或源代码实现这种文件名匹配？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冬天旳寂寞 2024-09-17 23:21:59

对于使用“*”和“?”的通配符名称匹配试试这个（如果你想避免提升，请使用 std::tr1::regex）：

#include <boost/regex.hpp>
#include <boost/algorithm/string/replace.hpp>

using std::string;

bool MatchTextWithWildcards(const string &text, string wildcardPattern, bool caseSensitive /*= true*/)
{
    // Escape all regex special chars
    EscapeRegex(wildcardPattern);

    // Convert chars '*?' back to their regex equivalents
    boost::replace_all(wildcardPattern, "\\?", ".");
    boost::replace_all(wildcardPattern, "\\*", ".*");

    boost::regex pattern(wildcardPattern, caseSensitive ? regex::normal : regex::icase);

    return regex_match(text, pattern);
}

void EscapeRegex(string ®ex)
{
    boost::replace_all(regex, "\\", "\\\\");
    boost::replace_all(regex, "^", "\\^");
    boost::replace_all(regex, ".", "\\.");
    boost::replace_all(regex, "$", "\\$");
    boost::replace_all(regex, "|", "\\|");
    boost::replace_all(regex, "(", "\\(");
    boost::replace_all(regex, ")", "\\)");
    boost::replace_all(regex, "{", "\\{");
    boost::replace_all(regex, "{", "\\}");
    boost::replace_all(regex, "[", "\\[");
    boost::replace_all(regex, "]", "\\]");
    boost::replace_all(regex, "*", "\\*");
    boost::replace_all(regex, "+", "\\+");
    boost::replace_all(regex, "?", "\\?");
    boost::replace_all(regex, "/", "\\/");
}

For wildcard name matching using '*' and '?' try this (if you want to avoid boost, use std::tr1::regex):

#include <boost/regex.hpp>
#include <boost/algorithm/string/replace.hpp>

using std::string;

bool MatchTextWithWildcards(const string &text, string wildcardPattern, bool caseSensitive /*= true*/)
{
    // Escape all regex special chars
    EscapeRegex(wildcardPattern);

    // Convert chars '*?' back to their regex equivalents
    boost::replace_all(wildcardPattern, "\\?", ".");
    boost::replace_all(wildcardPattern, "\\*", ".*");

    boost::regex pattern(wildcardPattern, caseSensitive ? regex::normal : regex::icase);

    return regex_match(text, pattern);
}

void EscapeRegex(string ®ex)
{
    boost::replace_all(regex, "\\", "\\\\");
    boost::replace_all(regex, "^", "\\^");
    boost::replace_all(regex, ".", "\\.");
    boost::replace_all(regex, "$", "\\$");
    boost::replace_all(regex, "|", "\\|");
    boost::replace_all(regex, "(", "\\(");
    boost::replace_all(regex, ")", "\\)");
    boost::replace_all(regex, "{", "\\{");
    boost::replace_all(regex, "{", "\\}");
    boost::replace_all(regex, "[", "\\[");
    boost::replace_all(regex, "]", "\\]");
    boost::replace_all(regex, "*", "\\*");
    boost::replace_all(regex, "+", "\\+");
    boost::replace_all(regex, "?", "\\?");
    boost::replace_all(regex, "/", "\\/");
}

回复收藏 0 原文

伪心 2024-09-17 23:21:59

周围有很多这样的功能。这是各种实现的目录，分为递归和非递归，如果

您不喜欢那里的许可（或者链接有问题等），这里有一种可能的匹配算法实现，该算法至少非常接近 Windows 使用的算法：

#include <string.h>
#include <iostream>

bool match(char const *needle, char const *haystack) {
    for (; *needle != '\0'; ++needle) {
        switch (*needle) {
        case '?': 
            if (*haystack == '\0')
                return false;
            ++haystack;
            break;
        case '*': {
            if (needle[1] == '\0')
                return true;
            size_t max = strlen(haystack);
            for (size_t i = 0; i < max; i++)
                if (match(needle + 1, haystack + i))
                    return true;
            return false;
        }
        default:
            if (*haystack != *needle)
                return false;
            ++haystack;
        }
    }
    return *haystack == '\0';
}

#ifdef TEST
#define CATCH_CONFIG_MAIN

#include "catch.hpp"

TEST_CASE("Matching", "[match]") {
    REQUIRE(match("a", "a") == true);
    REQUIRE(match("a", "b") == false);
    REQUIRE(match("a*", "a") == true);
    REQUIRE(match("a?", "a") == false);
    REQUIRE(match("a?", "ab") == true);
    REQUIRE(match("a*b", "ab") == true);
    REQUIRE(match("a*b", "acb") == true);
    REQUIRE(match("a*b", "abc") == false);
    REQUIRE(match("*a*??????a?????????a???????????????", 
        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") == true);
}

#endif

由于讨论了某些算法的复杂性其他答案，我会注意到我相信这具有 O(NM) 复杂性和 O(M) 存储使用（其中 N 是目标字符串的大小，M 是模式的大小）。

使用 @masterxilo 的测试对：

"*a*??????*a*?????????a???????????????", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

...这在我的机器上大约 3 微秒内找到了匹配项。这比典型模式慢很多——我的大多数其他测试在这台特定机器上运行大约 300 纳秒左右。

同时，@masterxilo 的代码在同一台机器上运行大约需要 11 微秒，因此速度仍然快了大约 3 到 4 倍（更不用说更小、更简单了）。

There are quite a few such functions around. Here's a directory of various implementations, sorted into recursive and non-recursive, etc.

In case you don't like the licensing there (or have trouble with the link, etc.) here's one possible implementation of a matching algorithm that at least closely approximates what Windows uses:

#include <string.h>
#include <iostream>

bool match(char const *needle, char const *haystack) {
    for (; *needle != '\0'; ++needle) {
        switch (*needle) {
        case '?': 
            if (*haystack == '\0')
                return false;
            ++haystack;
            break;
        case '*': {
            if (needle[1] == '\0')
                return true;
            size_t max = strlen(haystack);
            for (size_t i = 0; i < max; i++)
                if (match(needle + 1, haystack + i))
                    return true;
            return false;
        }
        default:
            if (*haystack != *needle)
                return false;
            ++haystack;
        }
    }
    return *haystack == '\0';
}

#ifdef TEST
#define CATCH_CONFIG_MAIN

#include "catch.hpp"

TEST_CASE("Matching", "[match]") {
    REQUIRE(match("a", "a") == true);
    REQUIRE(match("a", "b") == false);
    REQUIRE(match("a*", "a") == true);
    REQUIRE(match("a?", "a") == false);
    REQUIRE(match("a?", "ab") == true);
    REQUIRE(match("a*b", "ab") == true);
    REQUIRE(match("a*b", "acb") == true);
    REQUIRE(match("a*b", "abc") == false);
    REQUIRE(match("*a*??????a?????????a???????????????", 
        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") == true);
}

#endif

Since there was a discussion of complexity of some of the other answers, I'll note that I believe this has O(NM) complexity and O(M) storage use (where N is the size of the target string, and M is the size of the pattern).

With @masterxilo's test pair:

"*a*??????*a*?????????a???????????????", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

...this finds a match in approximately 3 microseconds on my machine. That is a lot slower than a typical pattern--most of my other tests run in about 300 nanoseconds or so on this particular machine.

At the same time, @masterxilo's code takes approximately 11 microseconds to run on the same machine, so this is still around 3 to 4 times faster (not to mention being somewhat smaller and simpler).

回复收藏 0 原文

沫离伤花 2024-09-17 23:21:59

查看 POSIX 函数 fnmatch、glob 和 wordexp。

回复收藏 0 原文

夏の忆 2024-09-17 23:21:59

这是我的尝试。

它是“C++”，但我故意保持它几乎完全与 C 兼容。
为了将其转换为 C，您所需要做的就是删除 template 部分并将 Pattern 和 Text 更改为类似 字符常量*。

// TEST THIS before use! I've only done limited testing.

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

template<class Pattern, class Text>
bool wildcard(
    Pattern const pat_begin, Pattern const pat_end,
    Text text_begin, Text const text_end)
{
    ptrdiff_t const pat_size = pat_end - pat_begin;
    ptrdiff_t stackbuf[64];
    size_t c = sizeof(stackbuf) / sizeof(*stackbuf);
    ptrdiff_t *p = stackbuf;
    size_t n = 0;
    p[n++] = 0;
    while (n > 0 && text_begin != text_end)
    {
        for (size_t i = 0; i < n; i++)
        {
            if (p[i] == pat_size)
            {
                p[i--] = p[--n];
                continue;
            }
            switch (*(pat_begin + p[i]))
            {
            case '?': ++p[i]; break;
            case '*':
                ptrdiff_t off;
                off = p[i];
                while (off < pat_size &&
                    *(pat_begin + off) == '*')
                { ++off; }
                if (n == c)
                {
                    ptrdiff_t const *const old = p;
                    c *= 2;
                    if (c == 0) { ++c; }
                    size_t const size = c * sizeof(*p);
                    p = (ptrdiff_t *)realloc(
                        old == stackbuf ? NULL : p,
                        size);
                    if (old == stackbuf)
                    { memcpy(p, old, n * sizeof(*old)); }
                }
                p[n++] = off;
                break;
            default:
                if (*(pat_begin + p[i]) == *text_begin)
                { ++p[i]; }
                else { p[i--] = p[--n]; }
                break;
            }
        }
        ++text_begin;
    }
    bool success = false;
    if (text_begin == text_end)
    {
        while (!success && n > 0)
        {
            --n;
            while (p[n] != pat_size &&
                *(pat_begin + p[n]) == '*')
            { ++p[n]; }
            if (p[n] == pat_size)
            { success = true; }
        }
    }
    if (p != stackbuf) { free(p); }
    return success;
}

bool wildcard(char const *const pattern, char const *const text)
{
    return wildcard(
        pattern,
        pattern + (pattern ? strlen(pattern) : 0),
        text,
        text + (text ? strlen(text) : 0));
}

bool wildcard(wchar_t const *const pattern, wchar_t const *const text)
{
    return wildcard(
        pattern,
        pattern + (pattern ? wcslen(pattern) : 0),
        text,
        text + (text ? wcslen(text) : 0));
}

当然，您可以随意以任何您想要的方式使用该代码。 :)

Here's my attempt at this.

It's "C++", but I've deliberately kept it almost completely C-compatible.
All you need to do in order to convert it to C is to remove the template section and change Pattern and Text to something like char const *.

// TEST THIS before use! I've only done limited testing.

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

template<class Pattern, class Text>
bool wildcard(
    Pattern const pat_begin, Pattern const pat_end,
    Text text_begin, Text const text_end)
{
    ptrdiff_t const pat_size = pat_end - pat_begin;
    ptrdiff_t stackbuf[64];
    size_t c = sizeof(stackbuf) / sizeof(*stackbuf);
    ptrdiff_t *p = stackbuf;
    size_t n = 0;
    p[n++] = 0;
    while (n > 0 && text_begin != text_end)
    {
        for (size_t i = 0; i < n; i++)
        {
            if (p[i] == pat_size)
            {
                p[i--] = p[--n];
                continue;
            }
            switch (*(pat_begin + p[i]))
            {
            case '?': ++p[i]; break;
            case '*':
                ptrdiff_t off;
                off = p[i];
                while (off < pat_size &&
                    *(pat_begin + off) == '*')
                { ++off; }
                if (n == c)
                {
                    ptrdiff_t const *const old = p;
                    c *= 2;
                    if (c == 0) { ++c; }
                    size_t const size = c * sizeof(*p);
                    p = (ptrdiff_t *)realloc(
                        old == stackbuf ? NULL : p,
                        size);
                    if (old == stackbuf)
                    { memcpy(p, old, n * sizeof(*old)); }
                }
                p[n++] = off;
                break;
            default:
                if (*(pat_begin + p[i]) == *text_begin)
                { ++p[i]; }
                else { p[i--] = p[--n]; }
                break;
            }
        }
        ++text_begin;
    }
    bool success = false;
    if (text_begin == text_end)
    {
        while (!success && n > 0)
        {
            --n;
            while (p[n] != pat_size &&
                *(pat_begin + p[n]) == '*')
            { ++p[n]; }
            if (p[n] == pat_size)
            { success = true; }
        }
    }
    if (p != stackbuf) { free(p); }
    return success;
}

bool wildcard(char const *const pattern, char const *const text)
{
    return wildcard(
        pattern,
        pattern + (pattern ? strlen(pattern) : 0),
        text,
        text + (text ? strlen(text) : 0));
}

bool wildcard(wchar_t const *const pattern, wchar_t const *const text)
{
    return wildcard(
        pattern,
        pattern + (pattern ? wcslen(pattern) : 0),
        text,
        text + (text ? wcslen(text) : 0));
}

Of course, feel free to use the code in any way you want. :)

回复收藏 0 原文

傻比既视感 2024-09-17 23:21:59

Mehrdad 提供的解决方案具有指数级的运行时间，因为它使用回溯。花了一秒钟才弄清楚 "*a*??????*a*?????????a??????????????? ?”,“啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊aaa" 是一个匹配。此外，也无法将字符串与“*”或“?”进行匹配。在其中。

这是我提出的 O(nm) 实现，其内存使用量为 O(n)，其中 n 是表达式的长度，m 是字符串的长度。它还支持转义 ?、* 和 \。

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

/** 
 * Wildcard matching.
 * See for example
 * http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/find_c_search_wildcard.mspx?mfr=true
 *
 *  * matches any amount of any characters, 
 *  ? matches any single character.
 *  c matches c.
 *  No escaping of * and ? implemented (these are not allowed in windows filenames anyways).
 *
 * Backtracking O(2^n) implementation.
 *
 * from http://stackoverflow.com/questions/3300419/file-name-matching-with-wildcard 
 * http://stackoverflow.com/a/12231681/524504
 */
template<class Pattern, class Text>
bool wildcard(
        Pattern const pat_begin,
        Pattern const pat_end,
        Text text_begin, 
        Text const text_end)
{
    ptrdiff_t const pat_size = pat_end - pat_begin;

    /* initial pattern position stack (offsets into the pattern string) and its size c */
    ptrdiff_t stackbuf[64];
    size_t c                     = sizeof(stackbuf) / sizeof(*stackbuf);
    /* Stack base. Can be realloc'ed to some new memory location if we need more */
    ptrdiff_t *p                 = stackbuf;

    /* pointer to the location in the stack */
    size_t n                     = 0;
    p[n++]                       = 0; /* position 0 in the stack not used */

    /* text_begin updated to skip everything successfully consumed  */
    while (n > 0 && text_begin != text_end) 
    {
        for (size_t i = 0; i < n; i++) 
        {
            /* if we're at the end of the pattern, but not at the end of the 
             * string, we might have done something wrong.
             */
            if (p[i] == pat_size) 
            {
                p[i--] = p[--n];
                continue;
            }
            /* current pattern character */
            switch (*(pat_begin + p[i]))
            {
                case '?': ++p[i]; break; /* simply advance pattern pointer */
                case '*':
                          ptrdiff_t off;
                          off = p[i];
                          while (off < pat_size &&
                                  *(pat_begin + off) == '*')
                          { ++off; }
                          /* if the stack is full, reallocate */
                          if (n == c)
                          {
                              ptrdiff_t const *const old = p;
                              c *= 2;
                              /* assert positive size
                               * stack size never reduced anyways?
                               */
                              if (c == 0) { ++c; }
                              size_t const size = c * sizeof(*p);
                              /* cannot use realloc to copy original stack 
                               * (ptr in realloc must be 
                               * "Pointer to a memory block previously 
                               * allocated with malloc, calloc or realloc."
                               * must do manually
                               */
                              p = (ptrdiff_t *)realloc(
                                       old == stackbuf ? NULL : p,
                                      size);
                              if (old == stackbuf)
                              { memcpy(p, old, n * sizeof(*old)); }
                          }
                          /* store offset */
                          p[n++] = off;
                          break;
                default: /* a normal character in the pattern */
                          /* must be matched exactly */
                          if (*(pat_begin + p[i]) == *text_begin)
                          { ++p[i]; } /* advance pattern pointer */
                          else /* if not, backtrack */
                          { p[i--] = p[--n]; }
                          break;
            }
        }
        ++text_begin;
    }

    bool success = false;
    if (text_begin == text_end)
    {
        while (!success && n > 0)
        {
            --n;
            while (p[n] != pat_size &&
                    *(pat_begin + p[n]) == '*')
            { ++p[n]; }
            if (p[n] == pat_size)
            { success = true; }
        }
    }

    /* need to free stack if it was reallocated */
    if (p != stackbuf) { free(p); }
    return success;
}

bool wildcard(char const *const pattern, char const *const text)
{
    return wildcard(
            pattern,
            pattern + (pattern ? strlen(pattern) : 0),
            text,
            text + (text ? strlen(text) : 0));
}

bool wildcard(wchar_t const *const pattern, wchar_t const *const text)
{
    return wildcard(
            pattern,
            pattern + (pattern ? wcslen(pattern) : 0),
            text,
            text + (text ? wcslen(text) : 0));
}

/**
 * Virtual Machine Style Regular Expression parsing/NFA emulation
 * used for wildcard matching. O(nm) algorithm.
 *
 * See http://swtch.com/~rsc/regexp/ for more on efficient RegEx parsing.
 *
 * Copyright (c) March 29, 2013 Paul Frischknecht
 * Can be distributed under the MIT licence, see bottom of file.
 */

//#define wildcard_fast_DEBUG /* define to make the alogorithm printf what its doing */

/**
 * Instructions are:
 *
 * star
 *   This launches a new thread at the current position and continues with the 
 *   current thread at the next position accepting any character.
 * char c
 *   Accepts exactly the character c
 * anychar
 *   Accepts any character.
 * end 
 *   Accepts the end of the input string.
 */
enum InstructionType {
    End     = 0,
    Star    = 1,
    Char    = 2,
    AnyChar = 3
};

struct Instruction {
    InstructionType i;
    int c;       /* the caracter this instruction matches - undefined for non Char */
    int threads; /* threads active in the given instruction */
    /*
     * storing this here allows us to find out wether there is a thread
     * active at a given instruction in O(1)
     */
};

/** 
 * Wildcard (file path) matching.
 * See for example 
 * http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/find_c_search_wildcard.mspx?mfr=true
 *
 *  * matches any amount of any characters, 
 *  ? matches any single character.
 *  c matches c.
 *
 * Escaping of '*', '?' and '\' via '\*', '\?' and '\\' implemented. 
 * All other bytes are recognized directly as is. 
 * The string may not end in a single (uneven amount of) '\'.
 *
 * If text_end is 0, the text is assumed to be 0 terminated.
 * Same for pattern_end. The end is exclusive.
 *
 * @return A pointer to the character after the last character matched.
 *
 * Virtual machine O(nm) implementation, see http://swtch.com/~rsc/regexp/regexp2.html
 *
 * TODO Factor out the precompilation step to speed up matching multiple strings
 * against the same expression.
 */
template<class Pattern, class Text>
Text wildcard_fast(
        Pattern const pat_begin,
        Pattern const pat_end,
        Text const text_begin, 
        Text const text_end)
{
    /* 
     * give some reasonable default program size so we don't have to call
     * malloc in most cases 
     */
    #define DEFAULTonstack_program_SIZE 256
    Instruction onstack_program[DEFAULTonstack_program_SIZE];
    /* this stores the current run and next run thread list, alternating */
    /* there are as most as many threads as instructions */
    Instruction* onstack_threads[DEFAULTonstack_program_SIZE*2]; 

    Instruction** threads      = onstack_threads;
    Instruction*  program      = onstack_program;
    int           program_size = sizeof(onstack_program)/sizeof(*program);

    Instruction* program_last_instruction = program + program_size - 1;

    /* program and pattern pointers */
    Instruction* pp   = program;
    Pattern      patp = pat_begin;

    /* compile */

    while ((pat_end == 0 && *patp != 0) || (pat_end != 0 && patp != pat_end)) {

        /* need more space */
        if (pp == program_last_instruction) {

            Instruction*  old_program = program;
            Instruction** old_threads = threads;
            int old_program_size      = program_size;

            program_size *= 2;

            program = (Instruction*) malloc(program_size*sizeof(*program));
            threads = (Instruction**)malloc(program_size*sizeof(*threads)*2);

            memcpy(program, old_program, old_program_size*sizeof(*program));

            if (old_program != onstack_program) {
                free(old_program); free(old_threads);
            }

            program_last_instruction = program + program_size - 1;
            pp = pp - old_program + program;
        }

        /* parse pattern */
        switch (*patp) {
            case '*': 
                pp->i = Star; 
                /* Optimize multiple stars away */
                while ((pat_end == 0 || patp+1 != pat_end) && *(patp+1) == '*')  
                    patp++; 
                break;

            case '?':
                pp->i = AnyChar; 
                break;

            case '\\': 
                pp->i = Char; 
                pp->c = *(++patp); /* assumes string does not end in \ */
                break;

            default: 
                pp->i = Char; 
                pp->c = *patp; 
                break;
        }

        pp->threads = 0;

        pp++;
        patp++;
    }

    /* add the End instruction at the end */
    program_last_instruction = pp;
    pp->i                    = End;
    pp->threads              = 0;

    /* run */
    Text sp = text_begin; /* input string pointer */
    int n = 1, c = 0; /* next and current index */
    int threadcount[2];

    /* initialize */
    threadcount[c] = 1;
    threads[0] = program;
    threads[0]->threads++;

    /* run over text */
    while ((text_end == 0 && *sp != 0) || (text_end != 0 && sp != text_end)) {
        /* unless recreated, all threads will die */
        threadcount[n] = 0;

        /* run over threads */
        for (int i = 0; i < threadcount[c]; i++) {

            Instruction* inst = threads[2*i+c];
            switch (inst->i) {
                case End: 
                    /* we may not reach end early */ 
                    /* kill this thread without recrating it */
                    inst->threads--; 
                    continue; /* with for loop */

                case Char: 
                    if (*sp != inst->c) {
                        /* if the character is not matched, kill this thread */
                        inst->threads--;
                        continue;
                    }
                    break;

                case Star: 
                    /* spawn off additional thread at current location */

                    if (inst->threads == 1) { 
                        /* only if there's noone active there yet */
                        threads[2*(threadcount[n]++)+n] = inst;
                        inst->threads++;
                    }
                    break;
            }
            /* 
             * common actions: increase program counter for current thread,
             * decrese amount of threads in last (current) instruction.
             */
            inst->threads--;
            inst++;
            inst->threads++;

            /* respawn at new location in next iteration */
            threads[2*(threadcount[n]++)+n] = inst;

            if (inst->i == Star && (inst+1)->threads == 0) { 
                /* 
                 * already follow no-match option to give us 
                 * more stuff to do 
                 */
                threads[2*(threadcount[n]++)+n] = inst+1;
                (inst+1)->threads++;
            }

#ifdef wildcard_fast_DEBUG
              for (int i = 0 ; i < threadcount[n]; i++) {
                printf("thread %d at %d.\n", i, threads[2*i+n]-program);
            }
#endif

        }

#ifdef wildcard_fast_DEBUG
        const char *ns[] = {
            "end",
            "star",
            "char",
            "anychar",
        };
        for (Instruction* p = program; p->i; p++) {
            printf("%d. %s %c (%d threads)\n", p-program, ns[p->i], p->i == Char ? p->c : ' ', p->threads);
        }
#endif

        /* swap next and current and advance */
        n = c;
        c = !c;
        sp++;
    }

    /* 
     * if there is no thread active in the End instruction when the 
     * end of the input was reached, this was no match
     */
    if (program_last_instruction->threads == 0) sp = 0; 

    if (program != onstack_program) {
        /* only need to free if we used malloc */
        free(program); free(threads);
    }

    return sp;
}

char const* wildcard_fast(
        char const *const pattern, 
        char const *const text) 
{
    return wildcard_fast(
            pattern,
            (const char*)0,
            text,
            (const char*)0);
}

wchar_t const* wildcard_fast(
        wchar_t const *const pattern,
        wchar_t const *const text) 
{
    return wildcard_fast(
            pattern,
            (const wchar_t*)0,
            text,
            (const wchar_t*)0);
}


/* tests */
#ifndef INCLUDE_IMPLEMENTATION
/* 
 * I just include this file in my projects and define this.
 * That works well for simple algorithms like this
 */

#include <stdio.h>
#include <time.h>

int main() {
    struct {
        char* p;
        char* t;
        bool expected_result;
    } test[] = {
        {
            "",
            "",
            true
        },
        {
            "a",
            "",
            false
        },
        {
            "",
            "a",
            false
        },
        {
            "a",
            "a",
            true
        },        
        {
            "****.txt", 
            "hello.txt",
            true
        },
        {
            "*.txt", 
            "hello.tzt",
            false
        },
        {
            "*.t?t*", 
            "hello.tzt",
            true
        },
        {
            "*.*", 
            "hi.there",
            true
        },
        /* the wildcard implementation will fail this as it doesn't understand escaping */
        {
            "\\*who\\?\\?.*\\\\", 
            "*who??.there\\",
            true
        },        
        /* these take extraordinaryly long on the O(2^n) implementation */
        {
            "**a*************************??????***a************************?????????a???????????????", 
            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            true
        },
        /* runs a bit better if we omit the extra *'s. The fast implementation already optimizes these away. */
        {
            "*a*??????*a*?????????a???????????????", 
            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            true
        },
        {0,0} /* marks end of tests */
    };

    int t0 = clock();
    const char* result;
    const char* r[] = {"no match", "match"};

    for (int i = 0; test[i].p; i++) {
        printf("=== Test %d: Matching '%s' against '%s'...===\n", i, test[i].t, test[i].p);

        /* wildcard_fast */
        t0 = clock();
        result = wildcard_fast(test[i].p, test[i].t);

        printf("  wildcard_fast (took %d ms):\n", clock()-t0);
        if (!!result != test[i].expected_result)
            printf("    Test failed (reported %s instead of expected %s).\n",
                    r[!!result], r[test[i].expected_result]);
        else if (result) {
            printf("    %s\n", test[i].t);
            printf("    %*.c\n", result-test[i].t+1, '^');
        }
        else
            printf("    No match.\n");

        /* wildcard */
        t0 = clock();
        result = (const char*)
            wildcard(test[i].p, test[i].t);

        printf("  wildcard (took %d ms):\n", clock()-t0);
        if (!!result != test[i].expected_result)
            printf("    Test failed (reported %s instead of expected %s).\n",
                    r[!!result], r[test[i].expected_result]);
        else if (result) printf("    Match.\n");
        else printf("    No match.\n");

        printf("\n");
    }
}
#endif

/*
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated
 * documentation files (the "Software"), to deal in the
 * Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute,
 * sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall
 * be included in all copies or substantial portions of the
 * Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
 * KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
 * WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
 * PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS
 * OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
 * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

这使用了虚拟机方法。有关高效正则表达式解析的更多信息，请参阅 http://swtch.com/~rsc/regexp/ 。

The solution Mehrdad provided has exponential running time because it uses backtracking. It takes it like a second to figure out that "*a*??????*a*?????????a???????????????", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" is a match. Also there's no way to match a string with '*' or '?' in it.

Here's an O(nm) implementation I came up with which has O(n) memory usage, where n is the length of the expression, m the length of the string. It also supports escaping ?, * and \.

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

/** 
 * Wildcard matching.
 * See for example
 * http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/find_c_search_wildcard.mspx?mfr=true
 *
 *  * matches any amount of any characters, 
 *  ? matches any single character.
 *  c matches c.
 *  No escaping of * and ? implemented (these are not allowed in windows filenames anyways).
 *
 * Backtracking O(2^n) implementation.
 *
 * from http://stackoverflow.com/questions/3300419/file-name-matching-with-wildcard 
 * http://stackoverflow.com/a/12231681/524504
 */
template<class Pattern, class Text>
bool wildcard(
        Pattern const pat_begin,
        Pattern const pat_end,
        Text text_begin, 
        Text const text_end)
{
    ptrdiff_t const pat_size = pat_end - pat_begin;

    /* initial pattern position stack (offsets into the pattern string) and its size c */
    ptrdiff_t stackbuf[64];
    size_t c                     = sizeof(stackbuf) / sizeof(*stackbuf);
    /* Stack base. Can be realloc'ed to some new memory location if we need more */
    ptrdiff_t *p                 = stackbuf;

    /* pointer to the location in the stack */
    size_t n                     = 0;
    p[n++]                       = 0; /* position 0 in the stack not used */

    /* text_begin updated to skip everything successfully consumed  */
    while (n > 0 && text_begin != text_end) 
    {
        for (size_t i = 0; i < n; i++) 
        {
            /* if we're at the end of the pattern, but not at the end of the 
             * string, we might have done something wrong.
             */
            if (p[i] == pat_size) 
            {
                p[i--] = p[--n];
                continue;
            }
            /* current pattern character */
            switch (*(pat_begin + p[i]))
            {
                case '?': ++p[i]; break; /* simply advance pattern pointer */
                case '*':
                          ptrdiff_t off;
                          off = p[i];
                          while (off < pat_size &&
                                  *(pat_begin + off) == '*')
                          { ++off; }
                          /* if the stack is full, reallocate */
                          if (n == c)
                          {
                              ptrdiff_t const *const old = p;
                              c *= 2;
                              /* assert positive size
                               * stack size never reduced anyways?
                               */
                              if (c == 0) { ++c; }
                              size_t const size = c * sizeof(*p);
                              /* cannot use realloc to copy original stack 
                               * (ptr in realloc must be 
                               * "Pointer to a memory block previously 
                               * allocated with malloc, calloc or realloc."
                               * must do manually
                               */
                              p = (ptrdiff_t *)realloc(
                                       old == stackbuf ? NULL : p,
                                      size);
                              if (old == stackbuf)
                              { memcpy(p, old, n * sizeof(*old)); }
                          }
                          /* store offset */
                          p[n++] = off;
                          break;
                default: /* a normal character in the pattern */
                          /* must be matched exactly */
                          if (*(pat_begin + p[i]) == *text_begin)
                          { ++p[i]; } /* advance pattern pointer */
                          else /* if not, backtrack */
                          { p[i--] = p[--n]; }
                          break;
            }
        }
        ++text_begin;
    }

    bool success = false;
    if (text_begin == text_end)
    {
        while (!success && n > 0)
        {
            --n;
            while (p[n] != pat_size &&
                    *(pat_begin + p[n]) == '*')
            { ++p[n]; }
            if (p[n] == pat_size)
            { success = true; }
        }
    }

    /* need to free stack if it was reallocated */
    if (p != stackbuf) { free(p); }
    return success;
}

bool wildcard(char const *const pattern, char const *const text)
{
    return wildcard(
            pattern,
            pattern + (pattern ? strlen(pattern) : 0),
            text,
            text + (text ? strlen(text) : 0));
}

bool wildcard(wchar_t const *const pattern, wchar_t const *const text)
{
    return wildcard(
            pattern,
            pattern + (pattern ? wcslen(pattern) : 0),
            text,
            text + (text ? wcslen(text) : 0));
}

/**
 * Virtual Machine Style Regular Expression parsing/NFA emulation
 * used for wildcard matching. O(nm) algorithm.
 *
 * See http://swtch.com/~rsc/regexp/ for more on efficient RegEx parsing.
 *
 * Copyright (c) March 29, 2013 Paul Frischknecht
 * Can be distributed under the MIT licence, see bottom of file.
 */

//#define wildcard_fast_DEBUG /* define to make the alogorithm printf what its doing */

/**
 * Instructions are:
 *
 * star
 *   This launches a new thread at the current position and continues with the 
 *   current thread at the next position accepting any character.
 * char c
 *   Accepts exactly the character c
 * anychar
 *   Accepts any character.
 * end 
 *   Accepts the end of the input string.
 */
enum InstructionType {
    End     = 0,
    Star    = 1,
    Char    = 2,
    AnyChar = 3
};

struct Instruction {
    InstructionType i;
    int c;       /* the caracter this instruction matches - undefined for non Char */
    int threads; /* threads active in the given instruction */
    /*
     * storing this here allows us to find out wether there is a thread
     * active at a given instruction in O(1)
     */
};

/** 
 * Wildcard (file path) matching.
 * See for example 
 * http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/find_c_search_wildcard.mspx?mfr=true
 *
 *  * matches any amount of any characters, 
 *  ? matches any single character.
 *  c matches c.
 *
 * Escaping of '*', '?' and '\' via '\*', '\?' and '\\' implemented. 
 * All other bytes are recognized directly as is. 
 * The string may not end in a single (uneven amount of) '\'.
 *
 * If text_end is 0, the text is assumed to be 0 terminated.
 * Same for pattern_end. The end is exclusive.
 *
 * @return A pointer to the character after the last character matched.
 *
 * Virtual machine O(nm) implementation, see http://swtch.com/~rsc/regexp/regexp2.html
 *
 * TODO Factor out the precompilation step to speed up matching multiple strings
 * against the same expression.
 */
template<class Pattern, class Text>
Text wildcard_fast(
        Pattern const pat_begin,
        Pattern const pat_end,
        Text const text_begin, 
        Text const text_end)
{
    /* 
     * give some reasonable default program size so we don't have to call
     * malloc in most cases 
     */
    #define DEFAULTonstack_program_SIZE 256
    Instruction onstack_program[DEFAULTonstack_program_SIZE];
    /* this stores the current run and next run thread list, alternating */
    /* there are as most as many threads as instructions */
    Instruction* onstack_threads[DEFAULTonstack_program_SIZE*2]; 

    Instruction** threads      = onstack_threads;
    Instruction*  program      = onstack_program;
    int           program_size = sizeof(onstack_program)/sizeof(*program);

    Instruction* program_last_instruction = program + program_size - 1;

    /* program and pattern pointers */
    Instruction* pp   = program;
    Pattern      patp = pat_begin;

    /* compile */

    while ((pat_end == 0 && *patp != 0) || (pat_end != 0 && patp != pat_end)) {

        /* need more space */
        if (pp == program_last_instruction) {

            Instruction*  old_program = program;
            Instruction** old_threads = threads;
            int old_program_size      = program_size;

            program_size *= 2;

            program = (Instruction*) malloc(program_size*sizeof(*program));
            threads = (Instruction**)malloc(program_size*sizeof(*threads)*2);

            memcpy(program, old_program, old_program_size*sizeof(*program));

            if (old_program != onstack_program) {
                free(old_program); free(old_threads);
            }

            program_last_instruction = program + program_size - 1;
            pp = pp - old_program + program;
        }

        /* parse pattern */
        switch (*patp) {
            case '*': 
                pp->i = Star; 
                /* Optimize multiple stars away */
                while ((pat_end == 0 || patp+1 != pat_end) && *(patp+1) == '*')  
                    patp++; 
                break;

            case '?':
                pp->i = AnyChar; 
                break;

            case '\\': 
                pp->i = Char; 
                pp->c = *(++patp); /* assumes string does not end in \ */
                break;

            default: 
                pp->i = Char; 
                pp->c = *patp; 
                break;
        }

        pp->threads = 0;

        pp++;
        patp++;
    }

    /* add the End instruction at the end */
    program_last_instruction = pp;
    pp->i                    = End;
    pp->threads              = 0;

    /* run */
    Text sp = text_begin; /* input string pointer */
    int n = 1, c = 0; /* next and current index */
    int threadcount[2];

    /* initialize */
    threadcount[c] = 1;
    threads[0] = program;
    threads[0]->threads++;

    /* run over text */
    while ((text_end == 0 && *sp != 0) || (text_end != 0 && sp != text_end)) {
        /* unless recreated, all threads will die */
        threadcount[n] = 0;

        /* run over threads */
        for (int i = 0; i < threadcount[c]; i++) {

            Instruction* inst = threads[2*i+c];
            switch (inst->i) {
                case End: 
                    /* we may not reach end early */ 
                    /* kill this thread without recrating it */
                    inst->threads--; 
                    continue; /* with for loop */

                case Char: 
                    if (*sp != inst->c) {
                        /* if the character is not matched, kill this thread */
                        inst->threads--;
                        continue;
                    }
                    break;

                case Star: 
                    /* spawn off additional thread at current location */

                    if (inst->threads == 1) { 
                        /* only if there's noone active there yet */
                        threads[2*(threadcount[n]++)+n] = inst;
                        inst->threads++;
                    }
                    break;
            }
            /* 
             * common actions: increase program counter for current thread,
             * decrese amount of threads in last (current) instruction.
             */
            inst->threads--;
            inst++;
            inst->threads++;

            /* respawn at new location in next iteration */
            threads[2*(threadcount[n]++)+n] = inst;

            if (inst->i == Star && (inst+1)->threads == 0) { 
                /* 
                 * already follow no-match option to give us 
                 * more stuff to do 
                 */
                threads[2*(threadcount[n]++)+n] = inst+1;
                (inst+1)->threads++;
            }

#ifdef wildcard_fast_DEBUG
              for (int i = 0 ; i < threadcount[n]; i++) {
                printf("thread %d at %d.\n", i, threads[2*i+n]-program);
            }
#endif

        }

#ifdef wildcard_fast_DEBUG
        const char *ns[] = {
            "end",
            "star",
            "char",
            "anychar",
        };
        for (Instruction* p = program; p->i; p++) {
            printf("%d. %s %c (%d threads)\n", p-program, ns[p->i], p->i == Char ? p->c : ' ', p->threads);
        }
#endif

        /* swap next and current and advance */
        n = c;
        c = !c;
        sp++;
    }

    /* 
     * if there is no thread active in the End instruction when the 
     * end of the input was reached, this was no match
     */
    if (program_last_instruction->threads == 0) sp = 0; 

    if (program != onstack_program) {
        /* only need to free if we used malloc */
        free(program); free(threads);
    }

    return sp;
}

char const* wildcard_fast(
        char const *const pattern, 
        char const *const text) 
{
    return wildcard_fast(
            pattern,
            (const char*)0,
            text,
            (const char*)0);
}

wchar_t const* wildcard_fast(
        wchar_t const *const pattern,
        wchar_t const *const text) 
{
    return wildcard_fast(
            pattern,
            (const wchar_t*)0,
            text,
            (const wchar_t*)0);
}


/* tests */
#ifndef INCLUDE_IMPLEMENTATION
/* 
 * I just include this file in my projects and define this.
 * That works well for simple algorithms like this
 */

#include <stdio.h>
#include <time.h>

int main() {
    struct {
        char* p;
        char* t;
        bool expected_result;
    } test[] = {
        {
            "",
            "",
            true
        },
        {
            "a",
            "",
            false
        },
        {
            "",
            "a",
            false
        },
        {
            "a",
            "a",
            true
        },        
        {
            "****.txt", 
            "hello.txt",
            true
        },
        {
            "*.txt", 
            "hello.tzt",
            false
        },
        {
            "*.t?t*", 
            "hello.tzt",
            true
        },
        {
            "*.*", 
            "hi.there",
            true
        },
        /* the wildcard implementation will fail this as it doesn't understand escaping */
        {
            "\\*who\\?\\?.*\\\\", 
            "*who??.there\\",
            true
        },        
        /* these take extraordinaryly long on the O(2^n) implementation */
        {
            "**a*************************??????***a************************?????????a???????????????", 
            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            true
        },
        /* runs a bit better if we omit the extra *'s. The fast implementation already optimizes these away. */
        {
            "*a*??????*a*?????????a???????????????", 
            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            true
        },
        {0,0} /* marks end of tests */
    };

    int t0 = clock();
    const char* result;
    const char* r[] = {"no match", "match"};

    for (int i = 0; test[i].p; i++) {
        printf("=== Test %d: Matching '%s' against '%s'...===\n", i, test[i].t, test[i].p);

        /* wildcard_fast */
        t0 = clock();
        result = wildcard_fast(test[i].p, test[i].t);

        printf("  wildcard_fast (took %d ms):\n", clock()-t0);
        if (!!result != test[i].expected_result)
            printf("    Test failed (reported %s instead of expected %s).\n",
                    r[!!result], r[test[i].expected_result]);
        else if (result) {
            printf("    %s\n", test[i].t);
            printf("    %*.c\n", result-test[i].t+1, '^');
        }
        else
            printf("    No match.\n");

        /* wildcard */
        t0 = clock();
        result = (const char*)
            wildcard(test[i].p, test[i].t);

        printf("  wildcard (took %d ms):\n", clock()-t0);
        if (!!result != test[i].expected_result)
            printf("    Test failed (reported %s instead of expected %s).\n",
                    r[!!result], r[test[i].expected_result]);
        else if (result) printf("    Match.\n");
        else printf("    No match.\n");

        printf("\n");
    }
}
#endif

/*
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated
 * documentation files (the "Software"), to deal in the
 * Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute,
 * sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall
 * be included in all copies or substantial portions of the
 * Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
 * KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
 * WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
 * PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS
 * OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
 * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

This uses a virtual machine approach. See http://swtch.com/~rsc/regexp/ for more on efficitent regular expression parsing.

回复收藏 0 原文

四叶草在未来唯美盛开 2024-09-17 23:21:59

这是一个无依赖的可移植 C++ 版本：

#include <string>

#include <string.h>

bool wild_match(const std::string& str, const std::string& pat) {
  std::string::const_iterator str_it = str.begin();
  for (std::string::const_iterator pat_it = pat.begin(); pat_it != pat.end();
       ++pat_it) {
    switch (*pat_it) {
      case '?':
        if (str_it == str.end()) {
          return false;
        }

        ++str_it;
        break;
      case '*': {
        if (pat_it + 1 == pat.end()) {
          return true;
        }

        const size_t max = strlen(&*str_it);
        for (size_t i = 0; i < max; ++i) {
          if (wild_match(&*(pat_it + 1), &*(str_it + i))) {
            return true;
          }
        }

        return false;
      }
      default:
        if (*str_it != *pat_it) {
          return false;
        }

        ++str_it;
    }
  }

  return str_it == str.end();
}

Here is a dependency free portable C++ version:

#include <string>

#include <string.h>

bool wild_match(const std::string& str, const std::string& pat) {
  std::string::const_iterator str_it = str.begin();
  for (std::string::const_iterator pat_it = pat.begin(); pat_it != pat.end();
       ++pat_it) {
    switch (*pat_it) {
      case '?':
        if (str_it == str.end()) {
          return false;
        }

        ++str_it;
        break;
      case '*': {
        if (pat_it + 1 == pat.end()) {
          return true;
        }

        const size_t max = strlen(&*str_it);
        for (size_t i = 0; i < max; ++i) {
          if (wild_match(&*(pat_it + 1), &*(str_it + i))) {
            return true;
          }
        }

        return false;
      }
      default:
        if (*str_it != *pat_it) {
          return false;
        }

        ++str_it;
    }
  }

  return str_it == str.end();
}

回复收藏 0 原文

巷雨优美回忆 2024-09-17 23:21:59

PathMatchSpec。尽管它受到 MAX_PATH 限制（即可接受不超过 260 个字符）。你最好实现自己的匹配器；这不是很多代码。

回复收藏 0 原文

故事还在继续 2024-09-17 23:21:59

这与 @nabulke 的答案几乎相同，使用 C++11 而不是 Boost （所以如果您喜欢这个答案，请务必给这个答案点赞）：

#include <regex>
#include <string>

std::regex wildcardToRegex(const std::string& wildcard, bool caseSensitive = true)
{
    // Note It is possible to automate checking if filesystem is case sensitive or not (e.g. by performing a test first time this function is ran)
    std::string regexString{ wildcard };
    // Escape all regex special chars:
    regexString = std::regex_replace(regexString, std::regex("\\\\"), "\\\\");
    regexString = std::regex_replace(regexString, std::regex("\\^"), "\\^");
    regexString = std::regex_replace(regexString, std::regex("\\."), "\\.");
    regexString = std::regex_replace(regexString, std::regex("\\$"), "\\$");
    regexString = std::regex_replace(regexString, std::regex("\\|"), "\\|");
    regexString = std::regex_replace(regexString, std::regex("\\("), "\\(");
    regexString = std::regex_replace(regexString, std::regex("\\)"), "\\)");
    regexString = std::regex_replace(regexString, std::regex("\\{"), "\\{");
    regexString = std::regex_replace(regexString, std::regex("\\{"), "\\}");
    regexString = std::regex_replace(regexString, std::regex("\\["), "\\[");
    regexString = std::regex_replace(regexString, std::regex("\\]"), "\\]");
    regexString = std::regex_replace(regexString, std::regex("\\+"), "\\+");
    regexString = std::regex_replace(regexString, std::regex("\\/"), "\\/");
    // Convert wildcard specific chars '*?' to their regex equivalents:
    regexString = std::regex_replace(regexString, std::regex("\\?"), ".");
    regexString = std::regex_replace(regexString, std::regex("\\*"), ".*");

    return std::regex(regexString, caseSensitive ? std::regex_constants::ECMAScript : std::regex_constants::icase);
}

bool wildmatch(const std::string& input, const std::string& wildcard)
{
    auto rgx = wildcardToRegex(wildcard);
    return std::regex_match(input, rgx);
}

This is pretty much the same as @nabulke's answer, using C++11 instead of Boost (so be sure to upvote that one too if you like this answer):

#include <regex>
#include <string>

std::regex wildcardToRegex(const std::string& wildcard, bool caseSensitive = true)
{
    // Note It is possible to automate checking if filesystem is case sensitive or not (e.g. by performing a test first time this function is ran)
    std::string regexString{ wildcard };
    // Escape all regex special chars:
    regexString = std::regex_replace(regexString, std::regex("\\\\"), "\\\\");
    regexString = std::regex_replace(regexString, std::regex("\\^"), "\\^");
    regexString = std::regex_replace(regexString, std::regex("\\."), "\\.");
    regexString = std::regex_replace(regexString, std::regex("\\
quot;), "\\quot;);
    regexString = std::regex_replace(regexString, std::regex("\\|"), "\\|");
    regexString = std::regex_replace(regexString, std::regex("\\("), "\\(");
    regexString = std::regex_replace(regexString, std::regex("\\)"), "\\)");
    regexString = std::regex_replace(regexString, std::regex("\\{"), "\\{");
    regexString = std::regex_replace(regexString, std::regex("\\{"), "\\}");
    regexString = std::regex_replace(regexString, std::regex("\\["), "\\[");
    regexString = std::regex_replace(regexString, std::regex("\\]"), "\\]");
    regexString = std::regex_replace(regexString, std::regex("\\+"), "\\+");
    regexString = std::regex_replace(regexString, std::regex("\\/"), "\\/");
    // Convert wildcard specific chars '*?' to their regex equivalents:
    regexString = std::regex_replace(regexString, std::regex("\\?"), ".");
    regexString = std::regex_replace(regexString, std::regex("\\*"), ".*");

    return std::regex(regexString, caseSensitive ? std::regex_constants::ECMAScript : std::regex_constants::icase);
}

bool wildmatch(const std::string& input, const std::string& wildcard)
{
    auto rgx = wildcardToRegex(wildcard);
    return std::regex_match(input, rgx);
}

回复收藏 0 原文

大海や 2024-09-17 23:21:59

您也可以尝试我的通配符匹配库： Wildest card

它是：

基于
用 C 编写的 NFA在单个头文件中
没有依赖项
没有递归
没有动态分配
复杂度为 O(MN)
需要 O(logM) 内存

示例：

#include "wildcard.h"
...
bool answer = wildcard("*.cpp", "file.cpp");

Also you can try my wildcard matching library: Wildest card

It is:

Based on NFA
Written in C in single header file
Has no dependencies
Has no recursion
Has no dynamic allocations
Has O(MN) complexity
Requires O(logM) memory

Example:

#include "wildcard.h"
...
bool answer = wildcard("*.cpp", "file.cpp");

回复收藏 0 原文

嘿咻 2024-09-17 23:21:59

这是一个高效的方法，它对字符串进行一次遍历，并在不需要的地方跳过比较（模式中的最后一个字符恰好是“*”，或者如果“*”之前的字符没有相同的字符）索引匹配）。它也不会通过使用指针来复制任何字符串，并且不会浪费任何 CPU 周期来使用 strlen()、sizeof()、.length() 或 .size() 获取字符串长度

#include <stdio.h>

_Bool wildcard_strcmp(char *line, char *pattern)
{
    _Bool wildcard = 0;
    char *placeholder;

    do
    {
        if ((*pattern == *line) || (*pattern == '?'))
        {
            line++;
            pattern++;
        }
        else if (*pattern == '*')
        {
            if (*(++pattern) == '\0')
                return 1;
            else
            {
                placeholder = pattern;
                wildcard = 1;
            }
        }
        else if (wildcard)
        {
            if (pattern == placeholder)
                line++;
            else
                pattern = placeholder;
        } 
        else
            return 0;
    } while (*line);

    if (*pattern == '\0')
        return 1;
    else
        return 0;
}

int main()
{
    char string[200] = "foobarfoobar";
    char pattern[200] = "fo?*barfoo*";

    if (wildcard_strcmp(string, pattern))
        printf("Match\n");
    else
        printf("No Match\n");

    return 0;
}

Here's an efficient one that does a single traverse of the string, and skips comparisons where they're not needed (the last character in pattern happens to be a '*' or if a character before a '*' doesn't have the same index match). It also doesn't copy any strings by making use of pointers and doesn't waste any CPU cycles on getting the string length with strlen(), sizeof(), .length(), or .size()

#include <stdio.h>

_Bool wildcard_strcmp(char *line, char *pattern)
{
    _Bool wildcard = 0;
    char *placeholder;

    do
    {
        if ((*pattern == *line) || (*pattern == '?'))
        {
            line++;
            pattern++;
        }
        else if (*pattern == '*')
        {
            if (*(++pattern) == '\0')
                return 1;
            else
            {
                placeholder = pattern;
                wildcard = 1;
            }
        }
        else if (wildcard)
        {
            if (pattern == placeholder)
                line++;
            else
                pattern = placeholder;
        } 
        else
            return 0;
    } while (*line);

    if (*pattern == '\0')
        return 1;
    else
        return 0;
}

int main()
{
    char string[200] = "foobarfoobar";
    char pattern[200] = "fo?*barfoo*";

    if (wildcard_strcmp(string, pattern))
        printf("Match\n");
    else
        printf("No Match\n");

    return 0;
}

回复收藏 0 原文

~没有更多了~