需要使用正则表达式对字符串执行通配符(*、?等)搜索

发布于 2024-11-27 19:39:05 字数 490 浏览 1 评论 0原文

我需要对字符串执行通配符(*? 等)搜索。 这就是我所做的:

string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

if (regex.IsMatch(input))
{
    MessageBox.Show("Found");
}
else
{
    MessageBox.Show("Not Found");
}

使用上面的代码,“找到”块被击中,但实际上它不应该被击中!

如果我的模式是“e*”,那么只有“Found”应该命中。

我的理解或要求是 d* 搜索应该找到包含“d”后跟任何字符的文本。

我应该将模式更改为“d.*”和“e.*”吗? .NET 中是否有对通配符的支持,它在使用 Regex 类时在内部执行此操作?

I need to perform Wildcard (*, ?, etc.) search on a string.
This is what I have done:

string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

if (regex.IsMatch(input))
{
    MessageBox.Show("Found");
}
else
{
    MessageBox.Show("Not Found");
}

With the above code "Found" block is hitting but actually it should not!

If my pattern is "e*" then only "Found" should hit.

My understanding or requirement is d* search should find the text containing "d" followed by any characters.

Should I change my pattern as "d.*" and "e.*"? Is there any support in .NET for Wild Card which internally does it while using Regex class?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

最美不过初阳 2024-12-04 19:39:05

来自 http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx

public static string WildcardToRegex(string pattern)
{
    return "^" + Regex.Escape(pattern)
                      .Replace(@"\*", ".*")
                      .Replace(@"\?", ".")
               + "$";
}

所以类似 foo*.xls? 的内容将被转换为 ^foo.*\.xls.$

From http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx:

public static string WildcardToRegex(string pattern)
{
    return "^" + Regex.Escape(pattern)
                      .Replace(@"\*", ".*")
                      .Replace(@"\?", ".")
               + "$";
}

So something like foo*.xls? will get transformed to ^foo.*\.xls.$.

咆哮 2024-12-04 19:39:05

您可以使用名为 LikeString 的 Visual Basic 函数在没有 RegEx 的情况下执行简单的通配符映射。

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;

if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
  Console.WriteLine("This matched!");
}

如果您使用CompareMethod.Text,它将比较不区分大小写。对于区分大小写的比较,您可以使用CompareMethod.Binary。

更多信息请参见:http://www.henrikbrinch.dk /Blog/2012/02/14/Wildcard-matching-in-C

MSDN:http://msdn .microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

You can do a simple wildcard mach without RegEx using a Visual Basic function called LikeString.

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;

if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
  Console.WriteLine("This matched!");
}

If you use CompareMethod.Text it will compare case-insensitive. For case-sensitive comparison, you can use CompareMethod.Binary.

More info here: http://www.henrikbrinch.dk/Blog/2012/02/14/Wildcard-matching-in-C

MSDN: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

生活了然无味 2024-12-04 19:39:05

glob 表达式 d* 的正确正则表达式公式是 ^d,这意味着匹配以 d 开头的任何内容。

    string input = "Message";
    string pattern = @"^d";
    Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

(在这种情况下,@ 引用不是必需的,但这是一个很好的做法,因为许多正则表达式使用需要单独保留的反斜杠转义符,并且它还向读者表明该字符串是特殊的)。

The correct regular expression formulation of the glob expression d* is ^d, which means match anything that starts with d.

    string input = "Message";
    string pattern = @"^d";
    Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

(The @ quoting is not necessary in this case, but good practice since many regexes use backslash escapes that need to be left alone, and it also indicates to the reader that this string is special).

≈。彩虹 2024-12-04 19:39:05

Windows 和 *nux 对待通配符的方式不同。 *?. 由 Windows 以非常复杂的方式处理,一个人的存在或位置会改变另一个人的含义。虽然 *nux 保持简单,但它所做的只是一种简单的模式匹配。除此之外,Windows 匹配 ? 0 或 1 个字符,Linux 正好匹配 1 个字符。

我没有找到这方面的权威文档,这里只是我根据在 Windows 8/XP 上几天的测试得出的结论(命令行,具体是 dir 命令,以及 Directory.txt)。 GetFiles 方法也使用相同的规则)和 Ubuntu Server 12.04.1(ls 命令)。我完成了数十个常见和不常见的案例,尽管也有很多失败的案例。

Gabe 的回答与 *nux 类似。如果你也想要一个 Windows 风格的,并且愿意接受不完美,那么这里是:

    /// <summary>
    /// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
    /// </summary>
    /// <param name="fileName">The file name to test, without folder.</param>
    /// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
    /// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
    /// <returns>true if the file name matches the pattern, false otherwise.</returns>
    public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
    {
        if (fileName == null)
            throw new ArgumentNullException("fileName");

        if (pattern == null)
            throw new ArgumentNullException("pattern");

        if (unixStyle)
            return WildcardMatchesUnixStyle(pattern, fileName);

        return WildcardMatchesWindowsStyle(fileName, pattern);
    }

    private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
    {
        var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
        if (dotdot >= 0)
        {
            for (var i = dotdot; i < pattern.Length; i++)
                if (pattern[i] != '.')
                    return false;
        }

        var normalized = Regex.Replace(pattern, @"\.+$", "");
        var endsWithDot = normalized.Length != pattern.Length;

        var endWeight = 0;
        if (endsWithDot)
        {
            var lastNonWildcard = normalized.Length - 1;
            for (; lastNonWildcard >= 0; lastNonWildcard--)
            {
                var c = normalized[lastNonWildcard];
                if (c == '*')
                    endWeight += short.MaxValue;
                else if (c == '?')
                    endWeight += 1;
                else
                    break;
            }

            if (endWeight > 0)
                normalized = normalized.Substring(0, lastNonWildcard + 1);
        }

        var endsWithWildcardDot = endWeight > 0;
        var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
        if (endsWithDotWildcardDot)
            normalized = normalized.Substring(0, normalized.Length - 1);

        normalized = Regex.Replace(normalized, @"(?!^)(\.\*)+$", @".*");

        var escaped = Regex.Escape(normalized);
        string head, tail;

        if (endsWithDotWildcardDot)
        {
            head = "^" + escaped;
            tail = @"(\.[^.]{0," + endWeight + "})?$";
        }
        else if (endsWithWildcardDot)
        {
            head = "^" + escaped;
            tail = "[^.]{0," + endWeight + "}$";
        }
        else
        {
            head = "^" + escaped;
            tail = "$";
        }

        if (head.EndsWith(@"\.\*") && head.Length > 5)
        {
            head = head.Substring(0, head.Length - 4);
            tail = @"(\..*)?" + tail;
        }

        var regex = head.Replace(@"\*", ".*").Replace(@"\?", "[^.]?") + tail;
        return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
    }

    private static bool WildcardMatchesUnixStyle(string pattern, string text)
    {
        var regex = "^" + Regex.Escape(pattern)
                               .Replace("\\*", ".*")
                               .Replace("\\?", ".")
                    + "$";

        return Regex.IsMatch(text, regex);
    }

有一件有趣的事情,甚至是 Windows API PathMatchSpecFindFirstFile。试试a1*.FindFirstFile说它匹配a1PathMatchSpec说不匹配。

Windows and *nux treat wildcards differently. *, ? and . are processed in a very complex way by Windows, one's presence or position would change another's meaning. While *nux keeps it simple, all it does is just one simple pattern match. Besides that, Windows matches ? for 0 or 1 chars, Linux matches it for exactly 1 chars.

I didn't find authoritative documents on this matter, here is just my conclusion based on days of tests on Windows 8/XP (command line, dir command to be specific, and the Directory.GetFiles method uses the same rules too) and Ubuntu Server 12.04.1 (ls command). I made tens of common and uncommon cases work, although there'are many failed cases too.

The answer by Gabe works like *nux. If you also want a Windows style one, and are willing to accept the imperfection, then here it is:

    /// <summary>
    /// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
    /// </summary>
    /// <param name="fileName">The file name to test, without folder.</param>
    /// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
    /// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
    /// <returns>true if the file name matches the pattern, false otherwise.</returns>
    public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
    {
        if (fileName == null)
            throw new ArgumentNullException("fileName");

        if (pattern == null)
            throw new ArgumentNullException("pattern");

        if (unixStyle)
            return WildcardMatchesUnixStyle(pattern, fileName);

        return WildcardMatchesWindowsStyle(fileName, pattern);
    }

    private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
    {
        var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
        if (dotdot >= 0)
        {
            for (var i = dotdot; i < pattern.Length; i++)
                if (pattern[i] != '.')
                    return false;
        }

        var normalized = Regex.Replace(pattern, @"\.+
quot;, "");
        var endsWithDot = normalized.Length != pattern.Length;

        var endWeight = 0;
        if (endsWithDot)
        {
            var lastNonWildcard = normalized.Length - 1;
            for (; lastNonWildcard >= 0; lastNonWildcard--)
            {
                var c = normalized[lastNonWildcard];
                if (c == '*')
                    endWeight += short.MaxValue;
                else if (c == '?')
                    endWeight += 1;
                else
                    break;
            }

            if (endWeight > 0)
                normalized = normalized.Substring(0, lastNonWildcard + 1);
        }

        var endsWithWildcardDot = endWeight > 0;
        var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
        if (endsWithDotWildcardDot)
            normalized = normalized.Substring(0, normalized.Length - 1);

        normalized = Regex.Replace(normalized, @"(?!^)(\.\*)+
quot;, @".*");

        var escaped = Regex.Escape(normalized);
        string head, tail;

        if (endsWithDotWildcardDot)
        {
            head = "^" + escaped;
            tail = @"(\.[^.]{0," + endWeight + "})?
quot;;
        }
        else if (endsWithWildcardDot)
        {
            head = "^" + escaped;
            tail = "[^.]{0," + endWeight + "}
quot;;
        }
        else
        {
            head = "^" + escaped;
            tail = "
quot;;
        }

        if (head.EndsWith(@"\.\*") && head.Length > 5)
        {
            head = head.Substring(0, head.Length - 4);
            tail = @"(\..*)?" + tail;
        }

        var regex = head.Replace(@"\*", ".*").Replace(@"\?", "[^.]?") + tail;
        return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
    }

    private static bool WildcardMatchesUnixStyle(string pattern, string text)
    {
        var regex = "^" + Regex.Escape(pattern)
                               .Replace("\\*", ".*")
                               .Replace("\\?", ".")
                    + "
quot;;

        return Regex.IsMatch(text, regex);
    }

There's a funny thing, even the Windows API PathMatchSpec does not agree with FindFirstFile. Just try a1*., FindFirstFile says it matches a1, PathMatchSpec says not.

漫雪独思 2024-12-04 19:39:05

d* 表示它应该匹配零个或多个“d”字符。所以任何字符串都是有效的匹配。尝试使用 d+ 来代替!

为了支持通配符模式,我将用 RegEx 等效项替换通配符。就像 * 变成 .*? 变成 .?。那么上面的表达式就变成了d.*

d* means that it should match zero or more "d" characters. So any string is a valid match. Try d+ instead!

In order to have support for wildcard patterns I would replace the wildcards with the RegEx equivalents. Like * becomes .* and ? becomes .?. Then your expression above becomes d.*

贵在坚持 2024-12-04 19:39:05

您需要将通配符表达式转换为正则表达式。例如:

    private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
    {
        // Replace the * with an .* and the ? with a dot. Put ^ at the
        // beginning and a $ at the end
        String pattern = "^" + Regex.Escape(wildcard).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

        // Now, run the Regex as you already know
        Regex regex;
        if(case_sensitive)
            regex = new Regex(pattern);
        else
            regex = new Regex(pattern, RegexOptions.IgnoreCase);

        return(regex.IsMatch(s));
    } 

You need to convert your wildcard expression to a regular expression. For example:

    private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
    {
        // Replace the * with an .* and the ? with a dot. Put ^ at the
        // beginning and a $ at the end
        String pattern = "^" + Regex.Escape(wildcard).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

        // Now, run the Regex as you already know
        Regex regex;
        if(case_sensitive)
            regex = new Regex(pattern);
        else
            regex = new Regex(pattern, RegexOptions.IgnoreCase);

        return(regex.IsMatch(s));
    } 
故事未完 2024-12-04 19:39:05

您必须转义输入通配符模式中的特殊正则表达式符号(例如模式 *.txt 将相当于 ^.*\.txt$
因此斜杠、大括号和许多特殊符号必须替换为 @"\" + s,其中 s - 特殊正则表达式符号。

You must escape special Regex symbols in input wildcard pattern (for example pattern *.txt will equivalent to ^.*\.txt$)
So slashes, braces and many special symbols must be replaced with @"\" + s, where s - special Regex symbol.

沩ん囻菔务 2024-12-04 19:39:05

您可能需要使用 来自 System.Management.Automation 程序集的 WildcardPattern。请参阅我的回答此处

You may want to use WildcardPattern from System.Management.Automation assembly. See my answer here.

撩起发的微风 2024-12-04 19:39:05

我认为@Dmitri 有很好的解决方案
使用通配符匹配字符串 https://stackoverflow.com/a/30300521/1726296

基于他的解决方案,我创建了两个扩展方法。 (归功于他)

可能会有帮助。

public static String WildCardToRegular(this String value)
{
        return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}

public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
        if (ignoreCase)
            return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);

        return Regex.IsMatch(value, WildCardToRegular(pattern));
}

用法:

string pattern = "file.*";

var isMatched = "file.doc".WildCardMatch(pattern)

string xlsxFile = "file.xlsx"
var isMatched = xlsxFile.WildCardMatch(pattern)

I think @Dmitri has nice solution at
Matching strings with wildcard https://stackoverflow.com/a/30300521/1726296

Based on his solution, I have created two extension methods. (credit goes to him)

May be helpful.

public static String WildCardToRegular(this String value)
{
        return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}

public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
        if (ignoreCase)
            return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);

        return Regex.IsMatch(value, WildCardToRegular(pattern));
}

Usage:

string pattern = "file.*";

var isMatched = "file.doc".WildCardMatch(pattern)

or

string xlsxFile = "file.xlsx"
var isMatched = xlsxFile.WildCardMatch(pattern)
冧九 2024-12-04 19:39:05

所有上面的代码到最后都不正确。

这是因为当搜索 zz*foo* 或 zz* 时,您将不会得到正确的结果。

如果你在TotalCommander中搜索“abcd”中的“abcd*”,他会找到一个abcd文件,所以所有上面的代码都是错误的。

这是正确的代码。

public string WildcardToRegex(string pattern)
{             
    string result= Regex.Escape(pattern).
        Replace(@"\*", ".+?").
        Replace(@"\?", "."); 

    if (result.EndsWith(".+?"))
    {
        result = result.Remove(result.Length - 3, 3);
        result += ".*";
    }

    return result;
}

All upper code is not correct to the end.

This is because when searching zz*foo* or zz* you will not get correct results.

And if you search "abcd*" in "abcd" in TotalCommander will he find a abcd file so all upper code is wrong.

Here is the correct code.

public string WildcardToRegex(string pattern)
{             
    string result= Regex.Escape(pattern).
        Replace(@"\*", ".+?").
        Replace(@"\?", "."); 

    if (result.EndsWith(".+?"))
    {
        result = result.Remove(result.Length - 3, 3);
        result += ".*";
    }

    return result;
}
优雅的叶子 2024-12-04 19:39:05

最受接受的答案在大多数情况下都可以正常工作,并且可以在大多数情况下使用:

"^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

但是,如果您允许在输入通配符模式中转义,例如“find \*”,这意味着您要搜索字符串“find *”带星号,它不起作用。已经转义的*将转义为“\\\\\\*”,替换后我们有“^va​​lue\\ with\\\\.*$” ,这是错误的。

以下代码(肯定可以优化和重写)处理这种特殊情况:

  public static string WildcardToRegex(string wildcard)
    {
        var sb = new StringBuilder();
        for (var i = 0; i < wildcard.Length; i++)
        {
            // If wildcard has an escaped \* or \?, preserve it like it is in the Regex expression
            var character = wildcard[i];
            if (character == '\\' && i < wildcard.Length - 1)
            {
                if (wildcard[i + 1] == '*')
                {
                    sb.Append("\\*");
                    i++;
                    continue;
                }

                if (wildcard[i + 1] == '?')
                {
                    sb.Append("\\?");
                    i++;
                    continue;
                }
            }

            switch (character)
            {
                // If it's unescaped * or ?, change it to Regex equivalents. Add more wildcard characters (like []) if you need to support them.
                case '*':
                    sb.Append(".*");
                    break;
                case '?':
                    sb.Append('.');
                    break;
                default:
                    //// Escape all other symbols because wildcard could contain Regex special symbols like '.'
                    sb.Append(Regex.Escape(character.ToString()));
                    break;
            }
        }

        return $"^{sb}$";
    }

此处提出了仅使用正则表达式替换的问题解决方案 https ://stackoverflow.com/a/15275806/1105564

The most accepted answer works fine for most cases and can be used in most scenarios:

"^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "
quot;;

However if you allow escaping in you input wildcard pattern, e.g. "find \*", meaning you want to search for a string "find *" with asterisk, it won't work. The already escaped * will be escaped to "\\\\\\*" and after replacing we have "^value\\ with\\\\.*$", which is wrong.

The following code (which for sure can be optimized and rewritten) handles that special case:

  public static string WildcardToRegex(string wildcard)
    {
        var sb = new StringBuilder();
        for (var i = 0; i < wildcard.Length; i++)
        {
            // If wildcard has an escaped \* or \?, preserve it like it is in the Regex expression
            var character = wildcard[i];
            if (character == '\\' && i < wildcard.Length - 1)
            {
                if (wildcard[i + 1] == '*')
                {
                    sb.Append("\\*");
                    i++;
                    continue;
                }

                if (wildcard[i + 1] == '?')
                {
                    sb.Append("\\?");
                    i++;
                    continue;
                }
            }

            switch (character)
            {
                // If it's unescaped * or ?, change it to Regex equivalents. Add more wildcard characters (like []) if you need to support them.
                case '*':
                    sb.Append(".*");
                    break;
                case '?':
                    sb.Append('.');
                    break;
                default:
                    //// Escape all other symbols because wildcard could contain Regex special symbols like '.'
                    sb.Append(Regex.Escape(character.ToString()));
                    break;
            }
        }

        return 
quot;^{sb}
quot;;
    }

Solution for the problem just with Regex substitutions is proposed here https://stackoverflow.com/a/15275806/1105564

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文