解析具有未定义数量的参数的表达式

发布于 2024-07-15 09:59:56 字数 928 浏览 5 评论 0原文

我正在尝试将自制语言中的字符串解析为一种树,例如:

# a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g

应该导致:

# a
  * b1 b2
    -> c
  * d1 d2
    -> e
# f1 f2
  * g

#、* 和 -> 是符号。 a、b1 等是文本。

因为目前我只知道 rpn 方法来计算表达式,我当前的解决方案如下。 如果我只允许在每个符号后面有一个文本标记,我可以轻松地将表达式首先转换为 RPN 表示法(b = b1 b2; d = d1 d2; f = f1 f2)并从这里解析它:

abc -> *德-> * # fg * #

然而,合并文本标记和其他任何内容似乎是有问题的。 我的想法是创建标记令牌 (M),因此 RPN 看起来像:

a M b2 b1 M c -> * M d2 d1 M e -> * # f2 f1 M g * #

这也是可解析的,似乎解决了问题。

也就是说:

  1. 是否有人有类似的经验,并且可以说它是或不是未来可行的解决方案?
  2. 是否有更好的方法来解析具有未定义的运算符数量的表达式?
  3. 您能给我指出一些好的资源吗?

笔记。 是的,我知道这个例子非常类似于 Lisp 前缀表示法,也许可行的方法是添加一些括号,但我在这里没有任何经验。 但是,源文本不得包含任何人为括号,而且我不确定如何处理潜在的中缀混合,例如 # a * b -> [如果值1 = 值2] c -> d.

谢谢你的帮助。

编辑:看来我正在寻找的是具有可变数量参数的后缀表示法的来源。

I'm trying to parse a string in a self-made language into a sort of tree, e.g.:

# a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g

should result in:

# a
  * b1 b2
    -> c
  * d1 d2
    -> e
# f1 f2
  * g

#, * and -> are symbols. a, b1, etc. are texts.

Since the moment I know only rpn method to evaluate expressions, and my current solution is as follows. If I allow only a single text token after each symbol I can easily convert expression first into RPN notation (b = b1 b2; d = d1 d2; f = f1 f2) and parse it from here:

a b c -> * d e -> * # f g * #

However, merging text tokens and whatever else comes seems to be problematic. My idea was to create marker tokens (M), so RPN looks like:

a M b2 b1 M c -> * M d2 d1 M e -> * # f2 f1 M g * #

which is also parseable and seems to solve the problem.

That said:

  1. Does anyone have experience with something like that and can say it is or it is not a viable solution for the future?
  2. Are there better methods for parsing expressions with undefined arity of operators?
  3. Can you point me at some good resources?

Note. Yes, I know this example very much resembles Lisp prefix notation and maybe the way to go would be to add some brackets, but I don't have any experience here. However, the source text must not contain any artificial brackets and also I'm not sure what to do about potential infix mixins like # a * b -> [if value1 = value2] c -> d.

Thanks for any help.

EDIT: It seems that what I'm looking for are sources on postfix notation with a variable number of arguments.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

缪败 2024-07-22 09:59:56

我无法完全理解你的问题,但看来你想要的是语法定义和解析器生成器。 我建议你看一下 ANTLR,它应该非常简单,可以为你的原始语法或RPN。

编辑:(经过自我批评,并努力理解问题细节。)实际上,从你的例子来看,语言语法不清楚。 然而,在我看来,前缀/后缀表示法的优点(即,您既不需要括号,也不需要优先级感知的解析器)源于这样一个事实:您每次都知道参数的数量您遇到一个运算符,因此您确切地知道要读取多少个元素(对于前缀表示法)或从堆栈中弹出(对于后缀表示法)。 OTOH,我相信拥有可以具有可变数量参数的运算符使得前缀/后缀表示法不仅难以解析,而且完全不明确。 以下面的表达式为例:

# a * b c d

以下三个表达式中哪一个是规范形式?

  1. (a,*(b,c,d))

  2. (a, *(b, c), d)

  3. (a, *(b), c, d)

如果不了解更多有关运算符的信息,就无法判断。 当然,您可以定义运算符的某种贪婪性,例如 * 比 # 更贪婪,因此它会吞噬所有参数。 但这会违背前缀表示法的目的,因为您根本无法写出上述三个变体中的第二个变体; 并非没有额外的句法元素。

现在我想起来了,我所知道的编程语言都不支持带有可变数量参数的运算符,而只支持函数/过程,这可能并非纯属偶然。

I couldn't fully understand your question, but it seems what you want is a grammar definition and a parser generator. I suggest you take a look at ANTLR, it should be pretty straightforward with it to define a grammar for either your original syntax or the RPN.

Edit: (After exercising self-criticism, and making some effort to understand the question details.) Actually, the language grammar is unclear from your example. However, it seems to me, that the advantages of the prefix/postfix notations (i.e. that you need neither parentheses nor a precedence-aware parser) stem from the fact that you know the number of arguments every time you encounter an operator, therefore you know exactly how many elements to read (for prefix notation) or to pop from the stack (for postfix notation). OTOH, I beleive that having operators which can have variable number of arguments makes prefix/postfix notations not simply difficult to parse but outright ambiguous. Take the following expression for example:

# a * b c d

Which of the following three is the canonical form?

  1. (a, *(b, c, d))

  2. (a, *(b, c), d)

  3. (a, *(b), c, d)

Without knowing more about the operators, it is impossible to tell. Of course you could define some sort of greedyness of the operators, e.g. * is greedier than #, so it gobbles up all the arguments. But this would beat the purpose of a prefix notation, because you simply wouldn't be able to write down the second variant from the above three; not without additinonal syntactic elements.

Now that I think of it, it is probably not by sheer chance that none of the programming languages I know support operators with a variable number of arguments, only functions/procedures.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文