我正在考虑使用编译器生成器实现 XML 转换器的想法,该编译器生成器基于 W3C 的 XML 1.1规范,其中包括完整的EBNF语法。
更准确地说,我计划使用 Qi-YACC 因为我想学习这个工具。这将是我第一次尝试使用任何编译器。
我计划实现的第一种翻译非常简单:XML 到 S-EXPRs。之后我打算概括一下我的翻译,但这不是我问题的重点。
您预计此类项目会遇到任何重大陷阱吗?我读过,使用 EBNF 翻译 XML 的方法是 一个坏主意。我想知道为什么。而且 Qi 语言并不像已经有 XML 解析器,所以我绝对不想在这里重新发明轮子。
I'm contemplating the idea of implementing a XML translator using a compiler generator, based on the W3C's XML 1.1 spec, which includes a complete EBNF grammar.
More precisely, I plan to use Qi-YACC because I want to learn this tool. It will be my first foray into using any compiler-compiler.
The first kind of translation I'm planning to implement is very straightforward: XML to S-EXPRs. Afterwards, I plan to generalize my translator, but this is not the point of my question.
Do you anticipate any major pitfall for this kind of project? I've read that translating XML using its EBNF is a bad idea. I wonder why. And it's not like the Qi language already had a XML parser, so I'm definitely not looking to reinvent the wheel here.
发布评论
评论(2)
我现在不明白为什么需要上下文来解析 XML。
但 QiYacc 可以使用全局变量来利用上下文。会更干净
如果你可以在解析器中传递一个状态,S,例如或类似的东西。
这不在 Qi 中,但我计划为 Shen 实现这样的功能。
所以这是可以做到的。
/斯特凡
I do not now the reason why context is needed to parse XML.
But QiYacc can make use of context using global variables. It would be cleaner
if you could pass a state, S, in the parser e.g. or something like that.
This is not in Qi but I plan to implement such a feature for Shen.
So it could be done.
/Stefan
我对 QiYACC 一无所知,但是将 XML 的 EBNF 转换为递归下降 (RD) 解析器或多或少是简单的。人们只需要记住这样一个事实,即在某些地方对语法进行一些小的调整可能会对解析器产生很大的性能影响。这是因为语法是在编写时考虑到简洁和清晰,而不是试图避免追逐规则。
我曾经在 C++ 中做过类似的事情,将 XML 语法写为一组类型。您可以在 代码项目 中查看我写的一篇文章。相同的基本原则可以应用于任何语言。
我还建议你看看 PEG 语法。它们通过允许您引入零宽度断言来扩展 EBNF,并且是增强解析器的 EBNF 语法的好方法。
I know nothing of QiYACC, however translating an EBNF of XML into a recursive descent (RD) parser is more or less straightforward. One just need to keep in mind the fact that there are places where some small tweaks to the grammar can have a big performance impact on the parser. This is because the grammars are written with succinctness and clarity in mind, rather than trying to avoid chasing down rules.
I did something like this once in C++ by writing the grammar of XML out as a set of types. You can see an article I wrote on it at Code Project. The same basic principles can be applied to any language.
I'd also suggest you look around at PEG grammars. They extend on EBNF by allowing you to introduce zero-width assertions, and are a great way to augment an EBNF grammar for a parser.