正则表达式解析任意深度的函数

发布于 2024-09-29 05:05:36 字数 1238 浏览 2 评论 0原文

我正在为其中包含的函数解析一种简单的语言(Excel 公式)。函数名称必须以任意字母开头,后跟任意数量的字母/数字,并以左括号结尾(中间没有空格)。例如 MyFunc(。该函数可以包含任何参数,包括其他函数,并且必须以右括号 ) 结尾。当然,括号内的数学是允许的 =MyFunc((1+1)) 并且 (1+1) 不应被检测为函数,因为它使函数失败我刚才描述的规则。我的目标是识别公式中最高级别的函数调用,识别函数名称,提取参数。通过参数,我可以递归地查找其他函数调用。

使用这个教程我破解了以下正则表达式。似乎没有人能做到这一点。他们都在下面粘贴的测试用例上失败了。

应该可以工作,但完全失败:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>[a-z][a-z0-9]*\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

这对于许多测试用例都有效,但对于下面的测试用例失败。我不认为它正确处理嵌套函数 - 它只是在嵌套中查找开括号/闭括号:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

这是打破它们的测试:

=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year(A$5),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1

这应该匹配为:

Date(ARGUMENTS1)
Weekday(ARGUMENTS2)
Where ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)

相反,它匹配:

ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)-1)

我正在使用 .net RegEx 它提供了用于外部存储器。

I'm parsing a simple language (Excel formulas) for the functions contained within. A function name must start with any letter, followed by any number of letters/numbers, and ending with an open paren (no spaces in between). For example MyFunc(. The function can contain any arguments, including other functions and must end with a close paren ). Of course, math within parens is allowed =MyFunc((1+1)) and (1+1) shouldn't be detected as a function because it fails the function rule I've just described. My goal is to recognize the highest level function calls in a formula, identify the function name, extract the arguments. With the arguments, I can recursively look for other function calls.

Using this tutorial I hacked up the following regexes. None seem to do the trick. They both fail on test case pasted below.

This should work but completely fails:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>[a-z][a-z0-9]*\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

This works for many test cases, but fails for test case below. I don't think it handles nested functions correctly- it just looks for open paren/close paren in the nesting:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

Here's the test that breaks them all:

=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year(A$5),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1

This should be matched as:

Date(ARGUMENTS1)
Weekday(ARGUMENTS2)
Where ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)

Instead it matches:

ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)-1)

I am using .net RegEx which provides for external memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

您的好友蓝忘机已上羡 2024-10-06 05:05:36

这完全在 .NET 正则表达式的能力之内。这是一个有效的演示:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class Test
  {
    public static void Main()
    {
      Regex r = new Regex(@"
        (?<name>[a-z][a-z0-9]*\()
          (?<body>
            (?>
               \((?<DEPTH>)
             |
               \)(?<-DEPTH>)
             |
               [^()]+
            )*
            (?(DEPTH)(?!))
          )
        \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

      string formula = @"=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year((A$5+1)),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1";

      foreach (Match m in r.Matches(formula))
      {
        Console.WriteLine("{0}\n", m.Value);
      }
    }
  }
}

输出:

Date(Year(A$5),Month(A$5),1)

Weekday(Date(Year((A$5+1)),Month(A$5),1))

正则表达式的主要问题是您将函数名称作为递归匹配的一部分 - 例如:

Name1(...Name2(...)...)

任何前面没有名称的开放括号都不会被计算在内,因为它是与最终的替代方案 |.? 相匹配),这打破了右括号的平衡。这也意味着您无法匹配像 =MyFunc((1+1)) 这样的公式,您在文本中提到但未包含在示例中。 (我添加了一组额外的括号来演示。)

编辑:这是支持非重要的、引用的括号的版本:

  Regex r = new Regex(@"
    (?<name>[a-z][a-z0-9]*\()
      (?<body>
        (?>
           \((?<DEPTH>)
         |
           \)(?<-DEPTH>)
         |
           ""[^""]+""
         |
           [^()""]+
        )*
        (?(DEPTH)(?!))
      )
    \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

This is well within the capabilities of .NET regexes. Here's a working demo:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class Test
  {
    public static void Main()
    {
      Regex r = new Regex(@"
        (?<name>[a-z][a-z0-9]*\()
          (?<body>
            (?>
               \((?<DEPTH>)
             |
               \)(?<-DEPTH>)
             |
               [^()]+
            )*
            (?(DEPTH)(?!))
          )
        \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

      string formula = @"=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year((A$5+1)),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1";

      foreach (Match m in r.Matches(formula))
      {
        Console.WriteLine("{0}\n", m.Value);
      }
    }
  }
}

output:

Date(Year(A$5),Month(A$5),1)

Weekday(Date(Year((A$5+1)),Month(A$5),1))

The main problem with your regex was that you were including the function name as part of the recursive match--for example:

Name1(...Name2(...)...)

Any open-paren that wasn't preceded by name was not counted, because it was matched by the final alternative, |.?), and that threw off the balance with the close-parens. That also meant that you couldn't match formulas like =MyFunc((1+1)), which you mentioned in the text but didn't include in the example. (I threw in an extra set of parens to demonstrate.)

EDIT: Here's the version with support for non-significant, quoted parens:

  Regex r = new Regex(@"
    (?<name>[a-z][a-z0-9]*\()
      (?<body>
        (?>
           \((?<DEPTH>)
         |
           \)(?<-DEPTH>)
         |
           ""[^""]+""
         |
           [^()""]+
        )*
        (?(DEPTH)(?!))
      )
    \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文