如何判断两个通配符是否重叠?
给定两个带有 * 通配符的字符串,我想知道是否可以创建一个与这两个字符串匹配的字符串。
例如,这两个是重叠的简单情况:
- Hello*World
- Hel*
但所有这些都是:
- *.csv
- reports*.csv
- reportsdump.csv
是否有为此发布的算法?或者也许是 Windows 中的实用函数或我可以调用或复制的库?
Given two strings with * wildcards, I would like to know if a string could be created that would match both.
For example, these two are a simple case of overlap:
- Hello*World
- Hel*
But so are all of these:
- *.csv
- reports*.csv
- reportsdump.csv
Is there an algorithm published for doing this? Or perhaps a utility function in Windows or a library I might be able to call or copy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
由于每个 glob 都可以写成正则表达式,并且可以找到两个正则表达式的交集(除非它们不是真正的正则表达式,但在这种情况下它们是正则的),因此您可以通过将它们转换为来找到两个 glob 的交集正则表达式,然后找到它们的交集。因此,您可以通过查找正则表达式的交集并检查它是否为空来确定两个 glob 是否相交。
然而,由于 glob 比正则表达式受到更多限制,因此有一种更简单的方法:
让我们将这两个 glob 称为 g1 和 g2。它们相交当且
Haskell 中的示例实现:
该算法不是如果 glob 包含大量通配符,则效率不会特别高,但它很容易实现,并且由于您可能计划将其与文件名一起使用,因此我怀疑您的 glob 长度将超过 1000 个字符。
Since every glob can be written as a regular expression and the intersection of two regular expressions can be found (unless they aren't really regular, but they would be in this case), you can find the intersection of two globs by transforming them into regular expressions and then finding the intersection of those. So you can find out whether two globs intersect by finding the intersection of the regular expressions and checking whether it's empty.
However since globs are more limited than regular expression, there is a much easier way:
Let's call the two globs g1 and g2. They intersect iff
An example implementation in haskell:
This algorithm isn't particular efficient if the globs contain a lot of wildcards, but it's very easy to implement and since you're likely planning to use this with filenames, I doubt you'll have globs longer than 1000 chars.
无论如何,这里是来自 sepp2k 在 C# 中的答案的算法的一个实现(我使用了显式的
return true;
和return false;
调用,以及注释,以提高算法的可读性):For what it's worth, here's one implementation of the algorithm from sepp2k's answer in C# (I used explicit
return true;
andreturn false;
calls, along with comments, for algorithm readability):您可以在模式长度总和的时间上线性地解决这个问题:
如果两个字符串都以非通配符开头或结尾,请检查它们是否匹配,直到一个模式遇到通配符(否则它们不匹配)。这将问题减少到至少一种模式以通配符开始且至少一种模式以通配符结束的情况。如果两个模式都有通配符(某处),那么它们必须匹配:
通配符吃掉所有 p2 直到最后一个通配符,然后使用 p2
通配符会吃掉所有 p1
吃掉 p2 直至其第一个通配符,然后使用 p2 通配符
把p1吃到最后一个通配符,然后用最后一个p1通配符吃
否则
,一个字符串 (p1) 没有通配符,而另一个字符串 (p2) 则包含用通配符标点的字符串 s1,s2,...。因此,只需搜索 s1 在 p1 中的第一次出现,然后搜索 s2 的第一次后续出现(从 p1 中的匹配末尾开始),依此类推。如果找到所有字符串,则模式匹配,否则它们不匹配't。
You can solve this in time linear in the sum of the pattern lengths:
If both strings start or end with non-wildcards, check that they match until one pattern hits a wildcard (otherwise they don't match). This reduces the problem to the case where at least one pattern starts with a wildcard and at least one pattern ends with a wildcard. If both patterns have wildcards (somewhere), then they have to match:
wildcard to eat up all of p2 up to its last wildcard, then use the p2
wildcard to eat up all of p1
to eat up p2 up to its first wildcard, then use the p2 wildcard to
eat up p1 to its last wildcard, then use the last p1 wildcard to eat
up the rest of p2
Otherwise, one string (p1) has no wildcards, and the other string (p2) has strings s1,s2,...punctuated with wildcards. So just search for the first occurrence of s1 in p1, then for the first subsequent occurrence of s2 (starting from the end of the match in p1), etc. If you find all of the strings, then the patterns match, otherwise they don't.
据我了解,您尝试确定一个正则表达式是否与另一个正则表达式正交?
如果是这样,这就是一个非常不简单的问题。
以下是有关理论的更多信息。
这是解决方案:Java 库。
用法:
As I understand you try to determine if a regex is orthogonal to another regex?
If so, this is very not trivial problem.
Here is more about Theory.
Here is solution: Java library.
Usage:
下面是 sepp2k 建议的算法的 C++ 实现,稍加修改:
Here is a c++ implementation of the algorithm suggested by sepp2k with a slight modifications: