应该添加哪些 DCG 规则?

发布于 2024-09-30 21:33:46 字数 2284 浏览 10 评论 0原文

好的,所以我正在使用 prolog 构建一个简单的 xml 解析器。我有以下 xml 文件:

<ip> <line> 7 </line> <envt> p1:1 in main:1 </envt> </ip>

<contour>
   <name> main:1 </name> 
   <items> 
    <item> <var> x:int </var> <val> 2 </val> </item>
    <item> <var> y:int </var> <val> 2 </val> </item>
   </items> 
   <rpdl> system </rpdl>
   <nested>
     <contour>
       <name> p1:1 </name>
       <items>
         <item> <var> y:int </var> <val> 0 </val> </item>
     <item> <var> q:proc </var> <val> p2 in main:1 </val> </item>
       </items>
       <rpdl> <line> 21 </line> <envt> main:1 in root:1 </envt> </rpdl>
     </contour>
  </nested>
</contour>

</program_state>

在 Prolog 中,我使用以下 DCG 规则:

xml([E]) --> element(E).
xml([E|L]) --> element(E), xml(L).

element(E) -->  begintag(N), elements(L), endtag(N), {E =.. [N|L]}.

elements(L) --> xml(L).
elements([E]) --> [E].

begintag(N) --> ['<', N, '>'].
endtag(N) -->   ['<', '/', N, '>'].

因此规则无法处理诸如“p1:1 in main:1”、“x:int”、“main:1”之类的内容。我实际上尝试将这些内容更改为“p1”、“x”、“main”,并且解析器工作得很好。现在我应该添加什么规则以便解析器可以处理不规则的标记?

解析树将是这样的:

program_state(
    ip(line(7), envt(p1:1 in main:1)),
    contour(name(main:1),
        items(item(var(x:int),val(2)),
              item(var(y:int),val(2))),
        rpdl(system),
        nested(contour( name(p1:1),
                items(item(var(y:int),val(0)),
                      item(var(q:proc),val(p2 in main:1))),
                rpdl(line(21),envt(main:1 in root:1)),
                  ))))

以下是我得到的:

program_state(
     ip(line(7), envt(p1)), 
     contour(name(main), 
         items(item(var(x), val(2)), 
               item(var(y), val(2))), 
     rpdl(system), 
     nested(contour(name(p1), 
             items(item(var(y), val(0)), 
                   item(var(q), val(p2))), 
             rpdl(line(21), envt(main))
               )))).

okay, so I am using prolog to build a simple xml parser. And I have the following xml file:

<ip> <line> 7 </line> <envt> p1:1 in main:1 </envt> </ip>

<contour>
   <name> main:1 </name> 
   <items> 
    <item> <var> x:int </var> <val> 2 </val> </item>
    <item> <var> y:int </var> <val> 2 </val> </item>
   </items> 
   <rpdl> system </rpdl>
   <nested>
     <contour>
       <name> p1:1 </name>
       <items>
         <item> <var> y:int </var> <val> 0 </val> </item>
     <item> <var> q:proc </var> <val> p2 in main:1 </val> </item>
       </items>
       <rpdl> <line> 21 </line> <envt> main:1 in root:1 </envt> </rpdl>
     </contour>
  </nested>
</contour>

</program_state>

and in Prolog I use the following DCG rules:

xml([E]) --> element(E).
xml([E|L]) --> element(E), xml(L).

element(E) -->  begintag(N), elements(L), endtag(N), {E =.. [N|L]}.

elements(L) --> xml(L).
elements([E]) --> [E].

begintag(N) --> ['<', N, '>'].
endtag(N) -->   ['<', '/', N, '>'].

so the rules can't handle things like "p1:1 in main:1", "x:int", "main:1". I actually tried to change those things into "p1", "x", "main", and the parser works perfectly fine. Now what rules should I add so that the parser could handle the irregular tokens?

The parse tree will be something like this:

program_state(
    ip(line(7), envt(p1:1 in main:1)),
    contour(name(main:1),
        items(item(var(x:int),val(2)),
              item(var(y:int),val(2))),
        rpdl(system),
        nested(contour( name(p1:1),
                items(item(var(y:int),val(0)),
                      item(var(q:proc),val(p2 in main:1))),
                rpdl(line(21),envt(main:1 in root:1)),
                  ))))

and the following is what I get:

program_state(
     ip(line(7), envt(p1)), 
     contour(name(main), 
         items(item(var(x), val(2)), 
               item(var(y), val(2))), 
     rpdl(system), 
     nested(contour(name(p1), 
             items(item(var(y), val(0)), 
                   item(var(q), val(p2))), 
             rpdl(line(21), envt(main))
               )))).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泛泛之交 2024-10-07 21:33:46

我对 XML 进行了标记,将其提供给解析器,并且运行良好。验证输入:带冒号(:)的符号需要加引号;否则它们代表module_name:module_specific_symbol。这是输入:

?- listing(input).
input([<, program_state, >, <, ip, >, <, line, >, '7', <, /, line, >, <, envt, >, ['p1:1', in, 'main:1'], <, /, envt, >, <, /, ip, >, <, contour, >, <, name, >, 'main:1', <, /, name, >, <, items, >, <, item, >, <, var, >, 'x:int', <, /, var, >, <, val, >, '2', <, /, val, >, <, /, item, >, <, item, >, <, var, >, 'y:int', <, /, var, >, <, val, >, '2', <, /, val, >, <, /, item, >, <, /, items, >, <, rpdl, >, system, <, /, rpdl, >, <, nested, >, <, contour, >, <, name, >, 'p1:1', <, /, name, >, <, items, >, <, item, >, <, var, >, 'y:int', <, /, var, >, <, val, >, '0', <, /, val, >, <, /, item, >, <, item, >, <, var, >, 'q:proc', <, /, var, >, <, val, >, [p2, in, 'main:1'], <, /, val, >, <, /, item, >, <, /, items, >, <, rpdl, >, <, line, >, '21', <, /, line, >, <, envt, >, ['main:1', in, 'root:1'], <, /, envt, >, <, /, rpdl, >, <, /, contour, >, <, /, nested, >, <, /, contour, >, <, /, program_state, >]).

true.

解析器如何调用的列表:

?- listing(run).
run :-
    consult('input.db'),
    input(A),
    phrase(xml(B), A),
    write(B),
    nl.

true.

解析器运行的列表:

?- run.
% input.db compiled 0.00 sec, 2,768 bytes
[program_state(ip(line(7),envt([p1:1,in,main:1])),contour(name(main:1),items(item(var(x:int),val(2)),item(var(y:int),val(2))),rpdl(system),nested(contour(name(p1:1),items(item(var(y:int),val(0)),item(var(q:proc),val([p2,in,main:1]))),rpdl(line(21),envt([main:1,in,root:1]))))))]
true 

I tokenized the XML, fed it to the parser and it worked fine. Verify the input: Symbols with a colon(:) in them need to be quoted; otherwise they represent module_name:module_specific_symbol. Here's the input:

?- listing(input).
input([<, program_state, >, <, ip, >, <, line, >, '7', <, /, line, >, <, envt, >, ['p1:1', in, 'main:1'], <, /, envt, >, <, /, ip, >, <, contour, >, <, name, >, 'main:1', <, /, name, >, <, items, >, <, item, >, <, var, >, 'x:int', <, /, var, >, <, val, >, '2', <, /, val, >, <, /, item, >, <, item, >, <, var, >, 'y:int', <, /, var, >, <, val, >, '2', <, /, val, >, <, /, item, >, <, /, items, >, <, rpdl, >, system, <, /, rpdl, >, <, nested, >, <, contour, >, <, name, >, 'p1:1', <, /, name, >, <, items, >, <, item, >, <, var, >, 'y:int', <, /, var, >, <, val, >, '0', <, /, val, >, <, /, item, >, <, item, >, <, var, >, 'q:proc', <, /, var, >, <, val, >, [p2, in, 'main:1'], <, /, val, >, <, /, item, >, <, /, items, >, <, rpdl, >, <, line, >, '21', <, /, line, >, <, envt, >, ['main:1', in, 'root:1'], <, /, envt, >, <, /, rpdl, >, <, /, contour, >, <, /, nested, >, <, /, contour, >, <, /, program_state, >]).

true.

A listing of the how the parser is invoked:

?- listing(run).
run :-
    consult('input.db'),
    input(A),
    phrase(xml(B), A),
    write(B),
    nl.

true.

A listing of the parser run:

?- run.
% input.db compiled 0.00 sec, 2,768 bytes
[program_state(ip(line(7),envt([p1:1,in,main:1])),contour(name(main:1),items(item(var(x:int),val(2)),item(var(y:int),val(2))),rpdl(system),nested(contour(name(p1:1),items(item(var(y:int),val(0)),item(var(q:proc),val([p2,in,main:1]))),rpdl(line(21),envt([main:1,in,root:1]))))))]
true 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文