将垂直树的树库转换为 s 表达式
我有一个解析树的集合,它们采用 ascii 表示形式,其中缩进决定结构(并且右括号是隐式的)。我需要将它们转换为 s 表达式,以便括号确定结构。这有点像 python 中的显着空格与大括号。输入格式是树的垂直表示,如下所示:
STA:fcl
=S:np
==DN:pron-dem("tia" <*> <Dem> <Du> <dem> DET P NOM) Tiaj
==H:n("akuzo" <act> <sd> P NOM) akuzoj
=fA:adv("certe") certe
=P:v-fin("dauxri" <va+TEMP> <mv> FUT VFIN) dauxros
.
应该变成:
(STA:fcl (S:np (DN:pron-dem Tiaj) (H:n akuzoj)) (fA:adv certe) (P:v-fin dauxros) .)
我的代码几乎做到了这一点,但不完全是。某处总会缺少一个括号;这变得非常令人沮丧。我应该使用合适的解析器,也许是 CFG?当前(混乱)代码位于 http://github.com/andreasvc/ eodop/blob/master/arbobanko.py
I have a collection of parse trees, and they are in this ascii representation where indentation determines the structure (and closing brackets are implicit). I need to convert them to s-expressions so that parentheses determine the structure. It's a little bit like python's significant whitespace vs. braces. The input format is a vertical representation of trees, like so:
STA:fcl
=S:np
==DN:pron-dem("tia" <*> <Dem> <Du> <dem> DET P NOM) Tiaj
==H:n("akuzo" <act> <sd> P NOM) akuzoj
=fA:adv("certe") certe
=P:v-fin("dauxri" <va+TEMP> <mv> FUT VFIN) dauxros
.
Should become:
(STA:fcl (S:np (DN:pron-dem Tiaj) (H:n akuzoj)) (fA:adv certe) (P:v-fin dauxros) .)
I have code that almost does it, but not quite. There's always a missing paren somewhere; it's getting very frustrating. Should I use a proper parser, maybe a CFG? The current (messy) code is at http://github.com/andreasvc/eodop/blob/master/arbobanko.py
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
仅关注您在这个问题中给出的示例,以及有关将垂直树转换为 S 表达式的问题的标题,例如...:
似乎有效,并且输出
我意识到您正在尝试做更多的“清理” " 比我在这里做的要好,但这可以集中在
clean
函数中,让reparse
来处理 Q 的标题。如果您不想随心所欲地打印,而是希望将结果作为字符串返回,那么所做的更改当然很小:Focusing only on the example you're giving in this Q, and the Q's title about converting vertical trees to S-expressions, something like...:
seems to work, and outputs
I realize you're trying to do much more "cleaning" than I'm doing here, but that can be concentrated in the
clean
function, leavingreparse
to deal with the Q's title. If you don't want to print as you go, but rather return the result as a string, the changes are of course quite minor: