涉及大括号的语法

发布于 2024-12-06 14:58:16 字数 376 浏览 3 评论 0原文

我正在尝试解决序言中的 DCG 语法并在一定程度上取得了成功,但我一直在评估涉及此类大括号的表达式。 expr( T, ['(', 5, +, 4, ')', *, 7], []),

expr(Z) --> num(Z).
expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.
expr(Z) --> num(X), [-], expr(Y), {Z is X-Y}.
expr(Z) --> num(X), [*], expr(Y), {Z is X*Y}.
num(D) --> [D], {number(D)}.

eval(L, V, []) :- expr(V, L, []).

I'm trying to solve DCG grammar in prolog and succeeded upto a point, i'm stuck in evaluating the expressions involving braces like these.
expr( T, [’(’, 5, +, 4, ’)’, *, 7], []),

expr(Z) --> num(Z).
expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.
expr(Z) --> num(X), [-], expr(Y), {Z is X-Y}.
expr(Z) --> num(X), [*], expr(Y), {Z is X*Y}.
num(D) --> [D], {number(D)}.

eval(L, V, []) :- expr(V, L, []).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

碍人泪离人颜 2024-12-13 14:58:16

Prolog 的 DCG 语法实现的解析器是递归下降 LL(something)(预测)语法。它从左到右遍历输入并生成最左推导。它们很容易制作,但语法必须符合一些限制:

它们不能是左递归的。可能/将会导致无限递归。这意味着在遵循递归路径之前必须从输入流中删除至少一个符号(令牌)。重构语法以消除左递归是一项相当机械的练习,尽管很乏味。请参阅任何不错的编译器书籍来了解如何做到这一点。

运算符优先级通常内置于语法本身的结构中。下面的 BNF 表示法显示了定义递归下降语法的一种方法,用于解析/评估简单算术表达式:

ArithmeticExpression     : AdditiveExpression
                         ;

AdditiveExpression       : MultiplicativeExpression
                         | MultiplicativeExpression '+' AdditiveExpression
                         | MultiplicativeExpression '-' AdditiveExpression
                         ;

MultiplicativeExpression : ExponentialExpression
                         | ExponentialExpression '*' MultiplicativeExpression
                         | ExponentialExpression '/' MultiplicativeExpression
                         | ExponentialExpression '%' MultiplicativeExpression
                         ;

ExponentialExpression    : UnaryExpression
                         | UnaryExpression '^' ExponentialExpression
                         ;

UnaryExpression          : '-' UnaryExpression
                         | AtomicExpression
                         ;

AtomicExpression         : '(' ArithmeticExpression ')'
                         | ['0'..'9']+
                         ;

每个运算符优先级级别的术语由下一个更高优先级的表达式构建。所以任意算术表达式只是一个单一的加法表达式。

每个加法表达式是 1 个或多个乘法表达式,由加法和减法运算符连接。

每个乘法表达式是 1 个或多个指数表达式,由乘法、除法和余数运算符连接。

每个指数表达式都是一个一元表达式,带有一个选项指数运算符,后跟另一个一元表达式。

每个一元表达式要么是原子表达式,要么是一元减号后跟另一个一元表达式。

每个原子表达式要么是括在括号中的任意算术表达式,要么是无符号整数标记。

将以上内容翻译成 Prolog 的 DCG 语法应该很简单。如何评价语法中每个子句所代表的术语应该是不言而喻的。

The parsers implemented by Prolog's DCG grammars are recursive-descent LL(something) (predictive) grammars. It walks the input from left to right and produces a leftmost derivation as it goes. They're easy to craft but the grammar's must conform to a few restrictions:

They cannot be left-recursive. Infinite recursion can/will result. This means that at least one symbol (token) has to be removed from the input stream prior to following a recursive path. Refactoring grammars to remove left-recursion is a fairly mechanical exercise, albeit tedious. See any decent compiler book on how to do that.

Operator precedence is typically built into the structure of the grammar itself. Here's BNF notation showing one way of defining a recursive descent grammar for the parsing/evaluation of simple arithmetic expressions:

ArithmeticExpression     : AdditiveExpression
                         ;

AdditiveExpression       : MultiplicativeExpression
                         | MultiplicativeExpression '+' AdditiveExpression
                         | MultiplicativeExpression '-' AdditiveExpression
                         ;

MultiplicativeExpression : ExponentialExpression
                         | ExponentialExpression '*' MultiplicativeExpression
                         | ExponentialExpression '/' MultiplicativeExpression
                         | ExponentialExpression '%' MultiplicativeExpression
                         ;

ExponentialExpression    : UnaryExpression
                         | UnaryExpression '^' ExponentialExpression
                         ;

UnaryExpression          : '-' UnaryExpression
                         | AtomicExpression
                         ;

AtomicExpression         : '(' ArithmeticExpression ')'
                         | ['0'..'9']+
                         ;

The term at each level of operator precedence is built from expressions of the next higher order of precedence. So an arbitrary arithmetic expression is just a single additive expression.

Each additive expression is 1 or more multiplicative expressions, joined by addition and subtraction operators.

Each multiplicative expression is 1 or more exponential expressions, joined by multiplication, division and remainder operators.

Each exponential expression is a unary expression with an option exponent operator followed by another unary expression.

Each unary expression is either an atomic expression, or a unary minus followed by another unary expression.

Each atomic expression is either an arbitrary arithmetic expression, enclosed in parentheses, or an unsigned integer token.

Translation of the above into Prolog's DCG syntax should be trivial. How to evaluate the term represented by each clause in the grammar should be self-evident.

美人骨 2024-12-13 14:58:16

它就是有效的。然而它并不比 yacc/bison 容易。

%?-eval('11*(7+5-2)^2*(11+8)').
eval(A) :- lex(A,L), evallist(L).

%?-evallist([11,*,'(',7,+,5,-,2,')',^,2,*,'(',11,+,8,')']).
evallist(L) :- e(R,L,[]),write(R),!.

e(N) --> t(N1), erest(N1,N).

erest(N1,N) --> [+], !, t(N2), {N3 is N1+N2}, erest(N3,N);
                [-], !, t(N2), {N3 is N1-N2}, erest(N3,N).
erest(N,N) --> [].

t(N) --> f(N1), trest(N1,N).

trest(N1,N) --> [*], !, f(N2), {N3 is N1*N2}, trest(N3,N);
                [/], !, f(N2), {N3 is N1/N2}, trest(N3,N).
trest(N,N) --> [].

f(N) --> n(N);
     n(N1), [^], f(N2), {N is N1**N2}.

n(N) --> ['('], !, e(N), [')'];
     [-], !, e(N1), {N is -N1};
     num(N). 

num(N) --> [N], {number(N)}.

lex(A,NL) :- 
  atom_chars(A,L), lex0(_,L,NL).

lex0(S,L,NL) :-
  L=[], (number(S), NL=[S], !; NL=[]), !;
  L=[E|R], (d(E,N), (number(S), !; S=0), S1 is S*10+N, lex0(S1, R, NL), !;
     lex0(_,R,NR), (number(S), NL=[S|[E|NR]], !;
        NL=[E|NR])).

d(I,N) :- 
  char_code(I,C), C > 47, C < 58, N is C - 48.

It just works. However it is not easier than yacc/bison.

%?-eval('11*(7+5-2)^2*(11+8)').
eval(A) :- lex(A,L), evallist(L).

%?-evallist([11,*,'(',7,+,5,-,2,')',^,2,*,'(',11,+,8,')']).
evallist(L) :- e(R,L,[]),write(R),!.

e(N) --> t(N1), erest(N1,N).

erest(N1,N) --> [+], !, t(N2), {N3 is N1+N2}, erest(N3,N);
                [-], !, t(N2), {N3 is N1-N2}, erest(N3,N).
erest(N,N) --> [].

t(N) --> f(N1), trest(N1,N).

trest(N1,N) --> [*], !, f(N2), {N3 is N1*N2}, trest(N3,N);
                [/], !, f(N2), {N3 is N1/N2}, trest(N3,N).
trest(N,N) --> [].

f(N) --> n(N);
     n(N1), [^], f(N2), {N is N1**N2}.

n(N) --> ['('], !, e(N), [')'];
     [-], !, e(N1), {N is -N1};
     num(N). 

num(N) --> [N], {number(N)}.

lex(A,NL) :- 
  atom_chars(A,L), lex0(_,L,NL).

lex0(S,L,NL) :-
  L=[], (number(S), NL=[S], !; NL=[]), !;
  L=[E|R], (d(E,N), (number(S), !; S=0), S1 is S*10+N, lex0(S1, R, NL), !;
     lex0(_,R,NR), (number(S), NL=[S|[E|NR]], !;
        NL=[E|NR])).

d(I,N) :- 
  char_code(I,C), C > 47, C < 58, N is C - 48.
葬花如无物 2024-12-13 14:58:16

这是我在 Prolog 历史上观察到的最奇怪的事情之一。
也就是说,错误的表达式语法已经存在很长时间了。
DEC10 Prolog 文档中已经发现了错误的语法
当我们查看规则时,就会发现不适合:

 expr(Z) --> num(X), "/", expr(Y), {Z is X/Y}.
 etc..

这使得除法运算符 xfy,但它应该是 yfx。因此与
上述规则表达式 10/2/5 被读作 10/(2/5) 并导致
结果是 25。但实际上这个例子应该读作 (10/2)/5
结果为 1。

问题在于正确的语法将是递归的。
DCG 确实存在左递归规则的问题。序言
解释器只会陷入左的无限循环
递归规则通过重复调用expr/3:

 expr(Z) --> expr(X), "/", num(Y), {Z is X/Y}
 etc..

所以解决方案是通过引入消除左递归
累加器和附加规则。我不知道这个是否
该方法通常有效,但在当前情况下肯定有效。
因此,正确且深度优先的可执行规则应为:

 expr(Y) --> num(X), expr_rest(X,Y).

 expr_rest(X,T) --> "/", !, num(Y), {Z is X/Y}, expr_rest(Z,T).
 etc..
 expr_rest(X,X).

上述语法是一个更具挑战性的 DCG。它不是
不再是纯粹的 Prolog,因为它使用了剪切(!)。但我们可以
消除切割,例如通过推回,沿线的东西
以下几行。推回又是一个复杂的过程
在 DCG 介绍中需要解释的问题,我们需要
在表达式末尾引入一个停止字符
它有效:

 etc..
 expr_rest(X,X), [C] --> [C], {not_operator(C)}.

或者我们既不能讨论剪切的长度,也不能讨论后推的长度
并接受它,在回溯时解析器会做额外的事情,
在目前的情况下,工作是不必要的。所以底线可能是,
虽然这个例子不正确,但是足够简单地解释DCG
不需要太多先进的 DCG 东西。

有趣的是,缺少括号的语法几乎不受
消除左递归。只需添加:

 num(X) --> "(", !, expr(X), ")".

哎呀,又被砍了!

PS:除了消除左递归之外,我们还可以
使用某种形式的表格。

This is one of the strangest things I observed in the history of Prolog.
Namely that a wrong expression syntax is shown around already for ages.
The wrong syntax is already found in the DEC10 Prolog documentation
and the misfit is seen when we look at a rule:

 expr(Z) --> num(X), "/", expr(Y), {Z is X/Y}.
 etc..

This makes the division operator xfy, but it should be yfx. Thus with
the above rule an expression 10/2/5 is read as 10/(2/5) and leads to
25 as the result. But in fact the example should be read as (10/2)/5
yielding 1 as the result.

The problem is that the correct syntax would be left recursive.
And DCG do have problems with left recursive rules. The Prolog
interpreter would just run into an infinite loop for a left
recursive rules by repeatedly call expr/3:

 expr(Z) --> expr(X), "/", num(Y), {Z is X/Y}
 etc..

So the solution is to eliminate the left recursion by introducing
an accumulator and additional rules. I don't know whether this
method works in general, but it works for sure in the present case.
So the correct and depth first executable rule would read:

 expr(Y) --> num(X), expr_rest(X,Y).

 expr_rest(X,T) --> "/", !, num(Y), {Z is X/Y}, expr_rest(Z,T).
 etc..
 expr_rest(X,X).

The above grammar is a little bit a more challenging DCG. It is not
anymore pure Prolog, since it uses the cut (!). But we could
eliminate the cut, for example by a push-back, something along
the following lines. The push-back is again a complicated
matter to explain in a DCG introduction, and we would need to
introduce an stop character at the end of an expression to make
it work:

 etc..
 expr_rest(X,X), [C] --> [C], {not_operator(C)}.

Or we could neither go into the lengths of the cut or the push-back
and live with it that on backtracking the parser will do additional,
in the present case unnecessary, work. So bottom line is probably,
although the example is not correct, it is simple enough to explain DCG
without needing to much advanced stuff of DCG.

Interestinglyg the missing parenthesis syntax is hardly affected by
the elimination of left recursion. Simply add:

 num(X) --> "(", !, expr(X), ")".

Oops, a cut again!

P.S.: Instead of eliminating the left recursion, we could also have
worked with some form of tabling.

未蓝澄海的烟 2024-12-13 14:58:16

添加此子句似乎有效: num(D) --> ['('], expr(D), [')'].

Adding this clause appears to work: num(D) --> ['('], expr(D), [')'].

痴情换悲伤 2024-12-13 14:58:16

感谢 @vladimir litovski 并基于 BNF 表示法(以及我的需要),我将其扩展为还包括逻辑表达式。这是我的代码(要查看完整的解释器,请查看我的git repo):

cond_expre(T) -->  and_expre(E1), or_rest(E1,T).     

or_rest(E1,T) -->  [punct('|'),punct('|')],!, and_expre(E2),   {V  = (\/,E1,E2)}, or_rest(V,T).
or_rest(T,T) --> [].

and_expre(T) --> equality_expre(E1), and_rest(E1,T).
and_rest(E1,T) --> [punct(&),punct(&)], !, equality_expre(E2), {V  = (/\,E1,E2)}, and_rest(V,T).
and_rest(T,T) --> [].

equality_expre(T) -->   relat_expre(E1), equality_rest(E1,T).   
equality_rest(E1,T) --> equality_op(Op) ,!,  relat_expre(E2), {  V=(Op,E1,E2)}, equality_rest(V,T).
equality_rest(T,T) --> [].

relat_expre(T) --> atomic_texpre(E1), relat_rest(E1,T). 
relat_rest(E1,T) -->  relat_op(Op) ,!, atomic_texpre(E2) , { V=(Op,E1,E2) },relat_rest(V,T).
relat_rest(T,T) --> [].

atomic_texpre(T) -->  arith_expre(T); [punct('(')], !, cond_expre(T), [punct(')')]      .
arith_expre(V) --> expre(V).



equality_op(==)         --> [punct(=),punct(=)]. 
equality_op(\=)        --> [punct(!),punct(=)]. 
relat_op(>=)        --> [punct(>),punct(=)]. 
relat_op(>)         --> [punct(>)]. 
relat_op('=<')        -->  [punct(<),punct(=)]. 
relat_op(<)         --> [punct(<)]. 



expre(N) --> multiplicative(N1), additive_rest(N1,N).

additive_rest(N1,N) --> [punct('+')], !, multiplicative(N2), {N3 = (+,N1,N2)}, additive_rest(N3,N);  
                        [punct('-')], !, multiplicative(N2), {N3 = (-,N1,N2)}, additive_rest(N3,N).
additive_rest(N,N) --> [].

multiplicative(N) --> atomic(N1), multiplicative_rest(N1,N).
multiplicative_rest(N1,N) --> [punct('*')], !, atomic(N2), {N3 = (*,N1,N2)}, multiplicative_rest(N3,N);
                                [punct('/')], !, atomic(N2), {N3 = (/,N1,N2)}, multiplicative_rest(N3,N);   
                                [punct('%')], !, atomic(N2), {N3 = (mod,N1,N2)}, multiplicative_rest(N3,N).
multiplicative_rest(N,N) --> [].

atomic(N) --> [punct('(')], !, expre(N), [punct(')')];  num(N). 
num(N) --> pl_constant(N).

pl_constant(num(N))     --> pl_integer(N), !.
pl_constant(id(X))       --> identifier(X), {call(id(X,_)) }. %Not sure if I remember what it does but I think, the right most call I wrote to assure that the variable is already registered in the cache so that a value can later be retrieved from it  

pl_integer(X)              --> [number(X)]. %the value on the right works together with a tokenizer library ->  :- use_module(library(tokenize)). It's basically a token labled as a number. Same with the next line.
identifier(X)              --> [word(X)].

Thanks to @vladimir lidovski and based on the BNF notation (and on my needs), I expanded it to also include logical expressions. Here is my code (To see the full interpreter checkout my git repo):

cond_expre(T) -->  and_expre(E1), or_rest(E1,T).     

or_rest(E1,T) -->  [punct('|'),punct('|')],!, and_expre(E2),   {V  = (\/,E1,E2)}, or_rest(V,T).
or_rest(T,T) --> [].

and_expre(T) --> equality_expre(E1), and_rest(E1,T).
and_rest(E1,T) --> [punct(&),punct(&)], !, equality_expre(E2), {V  = (/\,E1,E2)}, and_rest(V,T).
and_rest(T,T) --> [].

equality_expre(T) -->   relat_expre(E1), equality_rest(E1,T).   
equality_rest(E1,T) --> equality_op(Op) ,!,  relat_expre(E2), {  V=(Op,E1,E2)}, equality_rest(V,T).
equality_rest(T,T) --> [].

relat_expre(T) --> atomic_texpre(E1), relat_rest(E1,T). 
relat_rest(E1,T) -->  relat_op(Op) ,!, atomic_texpre(E2) , { V=(Op,E1,E2) },relat_rest(V,T).
relat_rest(T,T) --> [].

atomic_texpre(T) -->  arith_expre(T); [punct('(')], !, cond_expre(T), [punct(')')]      .
arith_expre(V) --> expre(V).



equality_op(==)         --> [punct(=),punct(=)]. 
equality_op(\=)        --> [punct(!),punct(=)]. 
relat_op(>=)        --> [punct(>),punct(=)]. 
relat_op(>)         --> [punct(>)]. 
relat_op('=<')        -->  [punct(<),punct(=)]. 
relat_op(<)         --> [punct(<)]. 



expre(N) --> multiplicative(N1), additive_rest(N1,N).

additive_rest(N1,N) --> [punct('+')], !, multiplicative(N2), {N3 = (+,N1,N2)}, additive_rest(N3,N);  
                        [punct('-')], !, multiplicative(N2), {N3 = (-,N1,N2)}, additive_rest(N3,N).
additive_rest(N,N) --> [].

multiplicative(N) --> atomic(N1), multiplicative_rest(N1,N).
multiplicative_rest(N1,N) --> [punct('*')], !, atomic(N2), {N3 = (*,N1,N2)}, multiplicative_rest(N3,N);
                                [punct('/')], !, atomic(N2), {N3 = (/,N1,N2)}, multiplicative_rest(N3,N);   
                                [punct('%')], !, atomic(N2), {N3 = (mod,N1,N2)}, multiplicative_rest(N3,N).
multiplicative_rest(N,N) --> [].

atomic(N) --> [punct('(')], !, expre(N), [punct(')')];  num(N). 
num(N) --> pl_constant(N).

pl_constant(num(N))     --> pl_integer(N), !.
pl_constant(id(X))       --> identifier(X), {call(id(X,_)) }. %Not sure if I remember what it does but I think, the right most call I wrote to assure that the variable is already registered in the cache so that a value can later be retrieved from it  

pl_integer(X)              --> [number(X)]. %the value on the right works together with a tokenizer library ->  :- use_module(library(tokenize)). It's basically a token labled as a number. Same with the next line.
identifier(X)              --> [word(X)].
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文