使用 Parse::RecDescent
我有以下输入
@Book{press,
author = "Press, W. and Teutolsky, S. and Vetterling, W. and Flannery B.",
title = "Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing",
year = 2007,
publisher = "Cambridge University Press"
}
,我必须为 RecDescent 解析器生成器编写语法。 输出中的数据应针对 xml 结构进行修改,并且应如下所示:
<book>
<keyword>press</keyword>
<author>Press, W.+Teutolsky, S.+Vetterling, W.+Flannery B.</author>
<title>Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing</title>
<year>2007</year>
<publisher>Cambridge University Press</publisher>
</book>
附加字段和重复字段应报告为错误(带有行号的正确消息,无需进一步解析)。我尝试从这样的事情开始:
use Parse::RecDescent;
open(my $in, "<", "parsing.txt") or die "Can't open parsing.txt: $!";
my $text;
while (<$in>) {
$text .= $_;
}
print $text;
my $grammar = q {
beginning: "\@Book\{" keyword fields "\}" { print "<book>\n",$item[2],$item[3],"</book>"; }
keyword: /[a-zA-Z]+/ "," { return " <keyword>".$item[1]."</keyword>\n"; }
fields: one "," two "," tree "," four { return $item[1].$item[3].$item[5].$item[7]; }
one: "author" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <author>",$item[4],"</author>\n"; }
two: "title" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <title>",$item[4],"</title>\n"; }
three: "year" "=" /[0-2][0-9][0-9][0-9]/ { return " <year>",$item[3],"</year>\n"; }
four: "publisher" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\""
{ $item[4] =~ s/\sand\s/\+/g;
return " <publisher>",$item[4],"</publisher>\n"; }
};
my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");
但我什至不知道这是否是正确的方法。请帮忙。
还有一个小问题。输入时的标签可能不按该特定顺序,但每个标签只能出现一次。我是否必须为(作者、标题、年份、出版商)的所有排列编写子规则?因为我想出了:
fields: field "," field "," field "," field
field: one | two | three | four
但它显然并不能阻止重复标签。
I have the following input
@Book{press,
author = "Press, W. and Teutolsky, S. and Vetterling, W. and Flannery B.",
title = "Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing",
year = 2007,
publisher = "Cambridge University Press"
}
and I have to write a grammar for RecDescent parser generator.
Data at output should be modified for the xml structure and should look like this:
<book>
<keyword>press</keyword>
<author>Press, W.+Teutolsky, S.+Vetterling, W.+Flannery B.</author>
<title>Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing</title>
<year>2007</year>
<publisher>Cambridge University Press</publisher>
</book>
Additional and repeated fields should be reported as errors (proper message with line number and no further parsing). I tried to start with something like this:
use Parse::RecDescent;
open(my $in, "<", "parsing.txt") or die "Can't open parsing.txt: $!";
my $text;
while (<$in>) {
$text .= $_;
}
print $text;
my $grammar = q {
beginning: "\@Book\{" keyword fields "\}" { print "<book>\n",$item[2],$item[3],"</book>"; }
keyword: /[a-zA-Z]+/ "," { return " <keyword>".$item[1]."</keyword>\n"; }
fields: one "," two "," tree "," four { return $item[1].$item[3].$item[5].$item[7]; }
one: "author" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <author>",$item[4],"</author>\n"; }
two: "title" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <title>",$item[4],"</title>\n"; }
three: "year" "=" /[0-2][0-9][0-9][0-9]/ { return " <year>",$item[3],"</year>\n"; }
four: "publisher" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\""
{ $item[4] =~ s/\sand\s/\+/g;
return " <publisher>",$item[4],"</publisher>\n"; }
};
my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");
But I don't even know if it's the correct way to do it. Please help.
There's a one more tiny problem. Tags at input might not be in that particular order, but each tag can appear only once. Do I have to write subrules for all permutations of (author,title,year,publisher)? Because I came up with:
fields: field "," field "," field "," field
field: one | two | three | four
but it obviously doesn't prevent from repeating tags.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,你有一个拼写错误:
tree
而不是two
。我运行了您的程序,但添加了以下几行:
并得到了此调试输出:
这表明它卡在子规则
one
处,并且press,
被放回输入溪流。这是因为您使用return
而不是$return =
作为 Parse::RecDescent 手册 说你应该。此外,一旦分配给
$return
变量,您就无法再返回列表,并且必须手动将字符串连接在一起。这是最终结果:
Firstly, you have a typo:
tree
instead ofthree
.I ran your program but added the lines:
and got this debug output:
This shows that it's getting stuck at subrule
one
, and thatpress,
is getting put back onto the input stream. This is because you're usingreturn
rather than$return =
as the Parse::RecDescent manual says you should.Furthermore, once you are assigning to the
$return
variable, you can no longer return a list, and must concatenate the strings together manually.Here's the final result: