如何判断两个通配符是否重叠?

发布于 2024-09-01 19:44:53 字数 269 浏览 6 评论 0原文

给定两个带有 * 通配符的字符串,我想知道是否可以创建一个与这两个字符串匹配的字符串。

例如,这两个是重叠的简单情况:

  1. Hello*World
  2. Hel*

但所有这些都是:

  1. *.csv
  2. reports*.csv
  3. reportsdump.csv

是否有为此发布的算法?或者也许是 Windows 中的实用函数或我可以调用或复制的库?

Given two strings with * wildcards, I would like to know if a string could be created that would match both.

For example, these two are a simple case of overlap:

  1. Hello*World
  2. Hel*

But so are all of these:

  1. *.csv
  2. reports*.csv
  3. reportsdump.csv

Is there an algorithm published for doing this? Or perhaps a utility function in Windows or a library I might be able to call or copy?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

葬花如无物 2024-09-08 19:44:53

由于每个 glob 都可以写成正则表达式,并且可以找到两个正则表达式的交集(除非它们不是真正的正则表达式,但在这种情况下它们是正则的),因此您可以通过将它们转换为来找到两个 glob 的交集正则表达式,然后找到它们的交集。因此,您可以通过查找正则表达式的交集并检查它是否为空来确定两个 glob 是否相交。

然而,由于 glob 比正则表达式受到更多限制,因此有一种更简单的方法:

让我们将这两个 glob 称为 g1 和 g2。它们相交当且

  1. 仅当 g1 和 g2 均为空或仅包含通配符。
  2. g1 和 g2 都不为空,并且以下条件之一为真(令 c1 为 g1 的第一个字符,t1 为包含其余字符的字符串 - 对于带有 c2 和 t2 的 g2 来说相同):
    1. c1 和 c2 相等且 t1 和 t2 相交
    2. c1 和/或 c2 是通配符,t1 与 g2 相交
    3. c1 和/或 c2 是通配符,g1 与 t2 相交

Haskell 中的示例实现:

intersect g1          []          = all (== '*') g1
intersect []          g2          = all (== '*') g2
intersect g1@('*':t1) g2@(c2:t2)  = intersect g1 t2 || intersect t1 g2
intersect g1@(c1:t1)  g2@('*':t2) = intersect t1 g2 || intersect g1 t2
intersect    (c1:t1)     (c2:t2)  = c1 == c2        && intersect t1 t2

该算法不是如果 glob 包含大量通配符,则效率不会特别高,但它很容易实现,并且由于您可能计划将其与文件名一起使用,因此我怀疑您的 glob 长度将超过 1000 个字符。

Since every glob can be written as a regular expression and the intersection of two regular expressions can be found (unless they aren't really regular, but they would be in this case), you can find the intersection of two globs by transforming them into regular expressions and then finding the intersection of those. So you can find out whether two globs intersect by finding the intersection of the regular expressions and checking whether it's empty.

However since globs are more limited than regular expression, there is a much easier way:

Let's call the two globs g1 and g2. They intersect iff

  1. Both g1 and g2 are empty or only contain wildcards.
  2. Neither g1 nor g2 are empty and one of the following conditions is true (let c1 be the first character of g1 and t1 the string containing the remaining characters - same for g2 with c2 and t2):
    1. c1 and c2 are equal and t1 and t2 intersect
    2. c1 and/or c2 is a wildcard and t1 intersects with g2
    3. c1 and/or c2 is a wildcard and g1 intersects with t2

An example implementation in haskell:

intersect g1          []          = all (== '*') g1
intersect []          g2          = all (== '*') g2
intersect g1@('*':t1) g2@(c2:t2)  = intersect g1 t2 || intersect t1 g2
intersect g1@(c1:t1)  g2@('*':t2) = intersect t1 g2 || intersect g1 t2
intersect    (c1:t1)     (c2:t2)  = c1 == c2        && intersect t1 t2

This algorithm isn't particular efficient if the globs contain a lot of wildcards, but it's very easy to implement and since you're likely planning to use this with filenames, I doubt you'll have globs longer than 1000 chars.

昵称有卵用 2024-09-08 19:44:53

无论如何,这里是来自 sepp2k 在 C# 中的答案的算法的一个实现(我使用了显式的 return true;return false; 调用,以及注释,以提高算法的可读性):

public static bool WildcardIntersect(string w1, string w2)
{
    // if both are empty or contain wildcards
    if ((string.IsNullOrEmpty(w1) || w1 == "*")
        && (string.IsNullOrEmpty(w2) || w2 == "*"))
        return true;

    // if either string is empty, return false
    // we can do this because we know the other string MUST be non-empty and non-wildcard
    if (string.IsNullOrEmpty(w1) || string.IsNullOrEmpty(w2))
        return false;

    char c1 = w1[0], // first character of wildcard string 1
         c2 = w2[0]; // first character of wildcard string 2
    string remain1 = w1.Substring(1), // remaining of wildcard string 1
           remain2 = w2.Substring(1); // remaining of wildcard string 2

    // if first letters match and remaining intersect
    if ((c1 == c2 && WildcardIntersect(remain1, remain2))
        // if either is a wildcard and either remaining intersects with the other whole
        || ((c1 == '*' || c2 == '*') && (WildcardIntersect(w1, remain2) || WildcardIntersect(remain1, w2))))
        return true;

    // else, no match, return false
    return false;
}

For what it's worth, here's one implementation of the algorithm from sepp2k's answer in C# (I used explicit return true; and return false; calls, along with comments, for algorithm readability):

public static bool WildcardIntersect(string w1, string w2)
{
    // if both are empty or contain wildcards
    if ((string.IsNullOrEmpty(w1) || w1 == "*")
        && (string.IsNullOrEmpty(w2) || w2 == "*"))
        return true;

    // if either string is empty, return false
    // we can do this because we know the other string MUST be non-empty and non-wildcard
    if (string.IsNullOrEmpty(w1) || string.IsNullOrEmpty(w2))
        return false;

    char c1 = w1[0], // first character of wildcard string 1
         c2 = w2[0]; // first character of wildcard string 2
    string remain1 = w1.Substring(1), // remaining of wildcard string 1
           remain2 = w2.Substring(1); // remaining of wildcard string 2

    // if first letters match and remaining intersect
    if ((c1 == c2 && WildcardIntersect(remain1, remain2))
        // if either is a wildcard and either remaining intersects with the other whole
        || ((c1 == '*' || c2 == '*') && (WildcardIntersect(w1, remain2) || WildcardIntersect(remain1, w2))))
        return true;

    // else, no match, return false
    return false;
}
雪若未夕 2024-09-08 19:44:53

您可以在模式长度总和的时间上线性地解决这个问题:

如果两个字符串都以非通配符开头或结尾,请检查它们是否匹配,直到一个模式遇到通配符(否则它们不匹配)。这将问题减少到至少一种模式以通配符开始且至少一种模式以通配符结束的情况。如果两个模式都有通配符(某处),那么它们必须匹配:

  • 如果 p1 以通配符开头并且 p2 以通配符结尾,则使用 p1
    通配符吃掉所有 p2 直到最后一个通配符,然后使用 p2
    通配符会吃掉所有 p1
  • 如果 p1 以通配符开头和结尾,则使用其起始通配符
    吃掉 p2 直至其第一个通配符,然后使用 p2 通配符
    把p1吃到最后一个通配符,然后用最后一个p1通配符吃
    否则

,一个字符串 (p1) 没有通配符,而另一个字符串 (p2) 则包含用通配符标点的字符串 s1,s2,...。因此,只需搜索 s1 在 p1 中的第一次出现,然后搜索 s2 的第一次后续出现(从 p1 中的匹配末尾开始),依此类推。如果找到所有字符串,则模式匹配,否则它们不匹配't。

You can solve this in time linear in the sum of the pattern lengths:

If both strings start or end with non-wildcards, check that they match until one pattern hits a wildcard (otherwise they don't match). This reduces the problem to the case where at least one pattern starts with a wildcard and at least one pattern ends with a wildcard. If both patterns have wildcards (somewhere), then they have to match:

  • if p1 starts with a wildcard and p2 ends with a wildcard, use the p1
    wildcard to eat up all of p2 up to its last wildcard, then use the p2
    wildcard to eat up all of p1
  • if p1 starts and ends with a wildcard, then use its starting wildcard
    to eat up p2 up to its first wildcard, then use the p2 wildcard to
    eat up p1 to its last wildcard, then use the last p1 wildcard to eat
    up the rest of p2

Otherwise, one string (p1) has no wildcards, and the other string (p2) has strings s1,s2,...punctuated with wildcards. So just search for the first occurrence of s1 in p1, then for the first subsequent occurrence of s2 (starting from the end of the match in p1), etc. If you find all of the strings, then the patterns match, otherwise they don't.

回忆躺在深渊里 2024-09-08 19:44:53

据我了解,您尝试确定一个正则表达式是否与另一个正则表达式正交?
如果是这样,这就是一个非常不简单的问题。

以下是有关理论的更多信息。

这是解决方案:Java 库。

用法:

/**
 * @return true if the two regexes will never both match a given string
 */
public boolean isRegexOrthogonal( String regex1, String regex2 ) {
   Automaton automaton1 = new RegExp(regex1).toAutomaton();
   Automaton automaton2 = new RegExp(regex2).toAutomaton();
   return automaton1.intersection(automaton2).isEmpty();
}

As I understand you try to determine if a regex is orthogonal to another regex?
If so, this is very not trivial problem.

Here is more about Theory.

Here is solution: Java library.

Usage:

/**
 * @return true if the two regexes will never both match a given string
 */
public boolean isRegexOrthogonal( String regex1, String regex2 ) {
   Automaton automaton1 = new RegExp(regex1).toAutomaton();
   Automaton automaton2 = new RegExp(regex2).toAutomaton();
   return automaton1.intersection(automaton2).isEmpty();
}
大海や 2024-09-08 19:44:53

下面是 sepp2k 建议的算法的 C++ 实现,稍加修改:

bool intersect(const std::string& pattern1, const std::string& pattern2) {
    if(pattern1.empty() && pattern2.empty()) return true;
    if("*" == pattern1 || "*" == pattern2) return true;

    if(pattern2.empty() && '*' == pattern1[0]) return true;
    if(pattern1.empty() && '*' == pattern2[0]) return true;

    if(pattern1.empty() || pattern2.empty()) return false;

    char c1 = pattern1[0];
    char c2 = pattern2[0];
    string subPattern1 = pattern1.substr(1);
    string subPattern2 = pattern2.substr(1);


    if('*' == c1 && '*' == c2)
        return intersect(pattern1, subPattern2) && intersect(subPattern1, pattern2);

    if('*' == c1 && intersect(pattern1, subPattern2)
       || '*' == c2 && intersect(subPattern1, pattern2)
       || c1 == c2 && intersect(subPattern1, subPattern2)) {
        return true;
    }

    return false;
}

Here is a c++ implementation of the algorithm suggested by sepp2k with a slight modifications:

bool intersect(const std::string& pattern1, const std::string& pattern2) {
    if(pattern1.empty() && pattern2.empty()) return true;
    if("*" == pattern1 || "*" == pattern2) return true;

    if(pattern2.empty() && '*' == pattern1[0]) return true;
    if(pattern1.empty() && '*' == pattern2[0]) return true;

    if(pattern1.empty() || pattern2.empty()) return false;

    char c1 = pattern1[0];
    char c2 = pattern2[0];
    string subPattern1 = pattern1.substr(1);
    string subPattern2 = pattern2.substr(1);


    if('*' == c1 && '*' == c2)
        return intersect(pattern1, subPattern2) && intersect(subPattern1, pattern2);

    if('*' == c1 && intersect(pattern1, subPattern2)
       || '*' == c2 && intersect(subPattern1, pattern2)
       || c1 == c2 && intersect(subPattern1, subPattern2)) {
        return true;
    }

    return false;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文