为什么 Group.Value 总是最后一个匹配的组字符串?

发布于 2024-08-15 02:26:06 字数 2403 浏览 14 评论 0原文

最近,我发现一个 C# Regex API 真的很烦人。

我有正则表达式 (([0-9]+)|([az]+))+。我想找到所有匹配的字符串。代码如下。

string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456defFOO";

Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;

while (match.Success)
{
    Console.WriteLine("Match" + (++matchCount));

    Console.WriteLine("Match group count = {0}", match.Groups.Count);
    for (int i = 0; i < match.Groups.Count; i++)
    {
        Group group = match.Groups[i];
        Console.WriteLine("Group" + i + "='" + group.Value + "'");
    }

    match = match.NextMatch();
    Console.WriteLine("go to next match");
    Console.WriteLine();
}

输出是:

Match1
Match group count = 4
Group0='abc123xyz456def'
Group1='def'
Group2='456'
Group3='def'
go to next match

似乎所有 group.Value 都是最后一个匹配的字符串(“def”和“456”)。我花了一些时间弄清楚我应该依靠 group.Captures 而不是 group.Value。

string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456def";
//Console.WriteLine(str);

Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;

while (match.Success)
{
    Console.WriteLine("Match" + (++matchCount));

    Console.WriteLine("Match group count = {0}", match.Groups.Count);
    for (int i = 0; i < match.Groups.Count; i++)
    {
        Group group = match.Groups[i];
        Console.WriteLine("Group" + i + "='" + group.Value + "'");

        CaptureCollection cc = group.Captures;
        for (int j = 0; j < cc.Count; j++)
        {
            Capture c = cc[j];
            System.Console.WriteLine("    Capture" + j + "='" + c + "', Position=" + c.Index);
        }
    }

    match = match.NextMatch();
    Console.WriteLine("go to next match");
    Console.WriteLine();
}

这将输出:

Match1
Match group count = 4
Group0='abc123xyz456def'
    Capture0='abc123xyz456def', Position=0
Group1='def'
    Capture0='abc', Position=0
    Capture1='123', Position=3
    Capture2='xyz', Position=6
    Capture3='456', Position=9
    Capture4='def', Position=12
Group2='456'
    Capture0='123', Position=3
    Capture1='456', Position=9
Group3='def'
    Capture0='abc', Position=0
    Capture1='xyz', Position=6
    Capture2='def', Position=12
go to next match

现在,我想知道为什么 API 设计是这样的。为什么 Group.Value 只返回最后一个匹配的字符串?这个设计看起来不太好。

Recently, I found one C# Regex API really annoying.

I have regular expression (([0-9]+)|([a-z]+))+. I want to find all matched string. The code is like below.

string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456defFOO";

Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;

while (match.Success)
{
    Console.WriteLine("Match" + (++matchCount));

    Console.WriteLine("Match group count = {0}", match.Groups.Count);
    for (int i = 0; i < match.Groups.Count; i++)
    {
        Group group = match.Groups[i];
        Console.WriteLine("Group" + i + "='" + group.Value + "'");
    }

    match = match.NextMatch();
    Console.WriteLine("go to next match");
    Console.WriteLine();
}

The output is:

Match1
Match group count = 4
Group0='abc123xyz456def'
Group1='def'
Group2='456'
Group3='def'
go to next match

It seems that all group.Value is the last matched string ("def" and "456"). I spent some time to figure out that I should count on group.Captures instead of group.Value.

string regularExp = "(([0-9]+)|([a-z]+))+";
string str = "abc123xyz456def";
//Console.WriteLine(str);

Match match = Regex.Match(str, regularExp, RegexOptions.None);
int matchCount = 0;

while (match.Success)
{
    Console.WriteLine("Match" + (++matchCount));

    Console.WriteLine("Match group count = {0}", match.Groups.Count);
    for (int i = 0; i < match.Groups.Count; i++)
    {
        Group group = match.Groups[i];
        Console.WriteLine("Group" + i + "='" + group.Value + "'");

        CaptureCollection cc = group.Captures;
        for (int j = 0; j < cc.Count; j++)
        {
            Capture c = cc[j];
            System.Console.WriteLine("    Capture" + j + "='" + c + "', Position=" + c.Index);
        }
    }

    match = match.NextMatch();
    Console.WriteLine("go to next match");
    Console.WriteLine();
}

This will output:

Match1
Match group count = 4
Group0='abc123xyz456def'
    Capture0='abc123xyz456def', Position=0
Group1='def'
    Capture0='abc', Position=0
    Capture1='123', Position=3
    Capture2='xyz', Position=6
    Capture3='456', Position=9
    Capture4='def', Position=12
Group2='456'
    Capture0='123', Position=3
    Capture1='456', Position=9
Group3='def'
    Capture0='abc', Position=0
    Capture1='xyz', Position=6
    Capture2='def', Position=12
go to next match

Now, I am wondering why the API design is like this. Why Group.Value only returns the last matched string? This design doesn't look good.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

调妓 2024-08-22 02:26:06

主要原因是历史性的:正则表达式一直都是这样工作的,可以追溯到 Perl 及以后。但这并不是真正糟糕的设计。通常,如果您想要这样的每个匹配,您只需省略最外面的量词(在这种情况下为 + )并使用 Matches() 方法而不是 Match ()。每种支持正则表达式的语言都提供了一种方法来做到这一点:在 Perl 或 JavaScript 中,您可以在 /g 模式下进行匹配;在 Ruby 中,您可以使用 scan 方法;在 Java 中,您重复调用 find() 直到返回 false。同样,如果您正在进行替换操作,则可以使用占位符($1$2\1、\2,具体取决于语言)。

另一方面,据我所知,没有其他 Perl 5 派生的正则表达式风格能够像 .NET 及其 CaptureCollections 那样提供检索中间捕获组匹配的能力。我并不感到惊讶:实际上很少有人真正需要像这样一次性捕获所有比赛。并考虑跟踪所有这些中间匹配所需的所有存储和/或处理能力。不过,这是一个不错的功能。

The primary reason is historical: regexes have always worked that way, going back to Perl and beyond. But it's not really bad design. Usually, if you want every match like that, you just leave off the outermost quantifier (+ in ths case) and use the Matches() method instead of Match(). Every regex-enabled language provides a way to do that: in Perl or JavaScript you do the match in /g mode; in Ruby you use the scan method; in Java you call find() repeatedly until it returns false. Similarly, if you're doing a replace operation, you can plug the captured substrings back in as you go with placeholders ($1, $2 or \1, \2, depending on the language).

On the other hand, I know of no other Perl 5-derived regex flavor that provides the ability to retrieve intermediate capture-group matches like .NET does with its CaptureCollections. And I'm not surprised: it's actually very seldom that you really need to capture all the matches in one go like that. And think of all the storage and/or processing power it can take to keep track of all those intermediate matches. It is a nice feature though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文