如何将方法调用转换为后缀表示法?
我正在为类似 javascript 的语言编写一个编译器,只是为了好玩。又名我正在学习轮子,所以我为自己做了一个并试图找出所有内容,但现在我陷入了困境。
我知道在解析简单的中缀表达式时,调车场算法是一个很好的算法。我也能够弄清楚如何为前缀和后缀运算符扩展该算法,并且还能够解析简单的函数。
例如:2+3*a(3,5)+b(3,5)
变为 2 3
(
是一个保护令牌,被压入堆栈,它将存储返回地址等。()
是call 命令调用堆栈顶部的函数,弹出必要数量的参数并在返回时推回结果。)
如果函数名称只是一个标记,如果直接后跟,我可以简单地将其标记为函数符号一个括号。在这个过程中,如果我遇到一个函数符号,我会将它压入操作符堆栈,并在完成参数转换后将其弹出。
到目前为止,这是有效的。
但是,如果我添加具有成员函数的选项,即 .
运算符。事情变得更加棘手。例如,我想转换 abc(12)+def(34)
我无法将 c 和 f 标记为函数,因为 abc
和 def 是函数。如果我在这样的表达式上启动解析器,结果将是
ab 。
这显然是错误的。我希望它是
这看起来是正确的。 但当然,如果我添加一些括号,我会让事情变得更复杂:(abc)()
。或者我创建一个函数,返回一个我再次调用的函数:f(a,b)(c,d)
。
有没有一种简单的方法来处理这些棘手的情况?
I'm writing a compiler for a javascript like language for fun. aka I'm learning about the wheel so I make one for myself and trying to find out everything but now I got stuck.
I know that shunting yard algorithm is a nice one when parsing simple infix expressions. I was able to figure out how to extend this algorithm for prefix and postfix operators too and also able to parse simple functions.
For example: 2+3*a(3,5)+b(3,5)
turns into 2 3 <G> 3 5 a () * + <G> 3 5 b () +
(<G>
is a guard token that is pushed on the stack it will store the return address etc. ()
is the call command that calls the function on the top of the stack that pops out the necessary amount of arguments and pushes back the result on return.)
If the function name is just one token I can simply mark it as function symbol if directly followed by a parenthesis. During the process if I encounter a function symbol I push it on the operator stack and pop it out when I finished converting the parameters.
This is working so far.
But if I add the option to have member functions, the .
operator. The things get more tricky. For example I want to convert the a.b.c(12)+d.e.f(34)
I can't mark c and f to be functions because a.b.c
and d.e.f
are functions. If I start my parser on an expression like this the result will be a b . <G> 12 c () . d e . <G> 34 f () .
Which is obviously wrong. I want it to be <G> 12 a b . c . () <G> 34 d e . f. ()
Which appears correct.
But of curse I can make the things more complicated if I add some parentheses: (a.b.c)()
. Or I make a function that returns a function which I call again: f(a,b)(c,d)
.
Is there an easy way handle these tricky situations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的方法的一个问题是您将对象及其成员视为由
.
分隔的两个单独的标记。经典的调车场算法对 OOP 一无所知,并且依赖于单个令牌进行函数调用。因此,解决问题的第一种方法是使用一个令牌来调用对象成员 - 即整个abc
必须是单个令牌。您还可以参考自动解析器生成器来获取问题的另一种解决方案。它们允许将目标语言(JavaScript)的完整语法定义为一组正式规则并自动生成解析器。流行工具列表包括在不同编程语言上生成解析器的工具: ANTLR、Bison + Lex、Lemon + Ragel 。
--阿尔乔姆
A problem of your approach is that you treat object and its member as two separate tokens separated by
.
. Classical Shunting yard algorithm knows nothing about OOP and relies on single token for function call. So the first way to resolve you problem is to use one token for a call of an object member -- i.e. entirea.b.c
must be a single token.You may also refer to automatic parser generators for another solution of your problem. They allow to define complete grammar of your target language (JavaScript) as a set of formal rules and generate parser automatically. List of popular tools includes tools that generates parser on different programming languages: ANTLR, Bison + Lex, Lemon + Ragel.
--artem
(我看到这个问题仍然存在。我自己找到了解决方案。)
首先,我将
(...)
和[...]
表达式作为一个威胁令牌并在需要时(递归地)扩展它们。然后我检测函数调用和数组下标。如果带括号的标记之前没有中缀运算符,则这是函数调用或数组下标,因此我在那里插入特殊的调用函数或访问运算符。通过这种修改,它就像魅力一样发挥作用。(I saw this question is still alive. I found the solution for it myself.)
First I threat the
(...)
and[...]
expressions as one token and expand them (recursively) when needed. Then I detect the function calls and array subscripts. If there isn't an infix operator before a parenthesized token, then that's a function call or an array subscript, so I insert a special call-function or access operator there. With this modification it works like charm.