解析器实现比较:正确性确认和(可能)优化
我已经实现了两个表达式解析器,采用递归下降和运算符优先级。它们是在 Object Pascal 中实现的。这是递归下降:
function ParseFactor: PNode;
var
Temp: PNode;
begin
Result := ParsePrimary;
if t.Kind in [tkDoubleAsterisks] then begin
New(Temp);
Temp^.Kind := nkPower;
Temp^.LeftOperand := Result;
// power operator is right associative
Lexer.Get(t);
Temp^.RightOperand := ParseFactor();
Result := Temp;
end;
end;
function ParseTerm: PNode;
var
Temp: PNode;
begin
Result := ParseFactor;
while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
New(Temp);
case t.Kind of
tkAmpersand : Temp^.Kind := nkAnd;
tkAsterisk : Temp^.Kind := nkMultiplication;
tkSlash : Temp^.Kind := nkDivision;
tkDoubleLessThan : Temp^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseFactor;
Result := Temp;
end;
end;
function ParseExpression: PNode;
var
Temp: PNode;
begin
Result := ParseTerm;
while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
New(Temp);
case t.Kind of
tkHorzBar: Temp^.Kind := nkOr;
tkCaret : Temp^.Kind := nkXor;
tkPlus : Temp^.Kind := nkAddition;
tkMinus : Temp^.Kind := nkSubstraction;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseTerm;
Result := Temp;
end;
end;
这是运算符优先级:
function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
case Kind of
tkHorzBar,tkCaret,tkPlus,tkMinus:
Result := 1;
tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
Result := 2;
tkDoubleAsterisks:
Result := 3;
else
Result := -1;
end;
end;
function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
Result := Kind in [tkDoubleAsterisks];
end;
function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
Op: TTokenKind;
RHS: PNode;
begin
while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
Op := t.Kind;
Lexer.Get(t);
RHS := ParsePrimary;
while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
do begin
RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
end;
New(Result);
case Op of
tkHorzBar : Result^.Kind := nkOr;
tkCaret : Result^.Kind := nkXor;
tkPlus : Result^.Kind := nkAddition;
tkMinus : Result^.Kind := nkSubstraction;
tkAmpersand : Result^.Kind := nkAnd;
tkAsterisk : Result^.Kind := nkMultiplication;
tkSlash : Result^.Kind := nkDivision;
tkDoubleLessThan : Result^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
tkDoubleAsterisks : Result^.Kind := nkPower;
end;
Result^.LeftOperand := LHS;
Result^.RightOperand := RHS;
LHS := Result;
end;
Result := LHS;
end;
function ParseExpression: PNode;
begin
Result := ParsePrimary;
if Assigned(Result) then begin
Result := ParseBinaryRHSExpression(Result,0);
end;
end;
这是两者之间唯一的本质区别。一些简单的测试表明两者似乎都能正确解析。但是,我不太确定运算符优先级的实现,因为这是我第一次真正实现它。令我困扰的令人惊讶的事实是,它比递归下降版本运行得慢(多花 1.5 倍)!我的编译器类和我读过的所有文章都指出,由于函数调用较少,运算符优先级解析应该比递归下降更有效,这也是我所期望的,因为代码似乎表明了这一点。我已经内联了附加函数来获得运算符优先级和右关联性测试,但这似乎没有多大帮助。请告诉我我是否做错了什么。请随意要求澄清某些事情。
I've implemented two expression parsers, in recursive descent and operator precedence. They're implemented in Object Pascal. Here's the recursive descent:
function ParseFactor: PNode;
var
Temp: PNode;
begin
Result := ParsePrimary;
if t.Kind in [tkDoubleAsterisks] then begin
New(Temp);
Temp^.Kind := nkPower;
Temp^.LeftOperand := Result;
// power operator is right associative
Lexer.Get(t);
Temp^.RightOperand := ParseFactor();
Result := Temp;
end;
end;
function ParseTerm: PNode;
var
Temp: PNode;
begin
Result := ParseFactor;
while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
New(Temp);
case t.Kind of
tkAmpersand : Temp^.Kind := nkAnd;
tkAsterisk : Temp^.Kind := nkMultiplication;
tkSlash : Temp^.Kind := nkDivision;
tkDoubleLessThan : Temp^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseFactor;
Result := Temp;
end;
end;
function ParseExpression: PNode;
var
Temp: PNode;
begin
Result := ParseTerm;
while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
New(Temp);
case t.Kind of
tkHorzBar: Temp^.Kind := nkOr;
tkCaret : Temp^.Kind := nkXor;
tkPlus : Temp^.Kind := nkAddition;
tkMinus : Temp^.Kind := nkSubstraction;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseTerm;
Result := Temp;
end;
end;
and here's the operator precedence:
function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
case Kind of
tkHorzBar,tkCaret,tkPlus,tkMinus:
Result := 1;
tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
Result := 2;
tkDoubleAsterisks:
Result := 3;
else
Result := -1;
end;
end;
function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
Result := Kind in [tkDoubleAsterisks];
end;
function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
Op: TTokenKind;
RHS: PNode;
begin
while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
Op := t.Kind;
Lexer.Get(t);
RHS := ParsePrimary;
while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
do begin
RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
end;
New(Result);
case Op of
tkHorzBar : Result^.Kind := nkOr;
tkCaret : Result^.Kind := nkXor;
tkPlus : Result^.Kind := nkAddition;
tkMinus : Result^.Kind := nkSubstraction;
tkAmpersand : Result^.Kind := nkAnd;
tkAsterisk : Result^.Kind := nkMultiplication;
tkSlash : Result^.Kind := nkDivision;
tkDoubleLessThan : Result^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
tkDoubleAsterisks : Result^.Kind := nkPower;
end;
Result^.LeftOperand := LHS;
Result^.RightOperand := RHS;
LHS := Result;
end;
Result := LHS;
end;
function ParseExpression: PNode;
begin
Result := ParsePrimary;
if Assigned(Result) then begin
Result := ParseBinaryRHSExpression(Result,0);
end;
end;
Those are the only essential difference between the two. Some simple tests show that both seem to parse correctly. However, I'm not really sure about the operator precedence implementation because this is the first time I really implement it. And the surprising fact which bothers me, it runs slower than the recursive descent version (it takes 1.5 more times)! My compiler classes and all articles I read states that operator precedence parsing should be more efficient than recursive descent due to fewer function calls and that's what I'm expecting as well since the code seems to show that. I've inline-d additional functions to get the operator precedence and right-associativity testing but this doesn't seem to help much. Please tell me whether I did something wrong. Feel free to ask for clarity of certain things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
与所有事情一样,权衡很重要。递归下降显式检查每个运算符标记,因此如果您有 N 个运算符,它实际上必须执行 N 个测试。运算符优先级只需要知道某物是运算符标记,并使用查找表。因此,它可以只使用几次查找,而不是 N 次测试。因此,如果有很多运算符,运算符优先级应该更快。
如果你的语法只有几个,那么即使仔细调整也不清楚它是否获胜。
从总体上看,解析器的速度可能并不重要。无论您正在构建使用解析器的任何工具,都会在解析器之外的其他地方花费更多的精力。
当机器很小时,运算符优先级是一个有趣的想法,因为可以在表中编码复杂的运算符层次结构。大多数情况下,它提供的权衡对于典型的工作站(甚至手持式)来说并不重要。对于简单的语法,我会坚持使用递归下降,并且对于其他任何事情,我都会坚持使用任何类型的解析器生成器。
As with all things, tradeoffs matter. Recursive descent checks for each operator token explicitly, so if you have N operators, it must essentially do N tests. Operator precedence only has to know that something is an operator token, and use a lookup table. So instead of N tests, it can use just a few lookups. So operator precedence should be faster if you have a lot of operators.
If your grammar has only a few, then it isn't clear its a win even if carefully tuned.
In the grand scheme of things, the speed of the parser probably doesn't matter. Whatever tool you are building that uses the parser will pretty much expend more effort someplace other than the parser.
Operator precedence was an interesting idea when machines were small because one could encode complex operator hierarchies in the table. Mostly the tradeoffs it offered don't matter for the typical workstration (or even hand-held). I'd stick to recursive descent for simple grammars, and to parser generators of whatever kind seems convenient for anything else.