编译/解析有意义的空白
你好,我正在寻找一种伪 Markdown 类型的语言和一个解析器来将其解析为 xhtml。
我从未编写过编译器...我简单地了解了 ANTLR,并且想知道 ANTLR 是否可以处理具有有意义的空格的解析?
所以说我有这样的东西:
some text
some other text
# bullet point
# nested bullet point
根据上下文和前缀空格的数量,这些行将意味着不同的事情。
为此编写解析器有什么好工具?
谢谢, 亚历克斯
Hi I'm looking to make a pseudo Markdown kind of language and a parser to parse it into xhtml.
I've never written a compiler... I've taken brief looks at ANTLR and am wondering if ANTLR can handle parsing things with meaningful whitespace?
So say I have something like this:
some text
some other text
# bullet point
# nested bullet point
Depending on context and number of prefixing spaces, those lines would mean different things.
What is a good tool to use to write a parser for this?
Thanks,
Alex
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
ANTLR 肯定可以用于此目的。但是,如果您对 ANTLR 或解析器生成器不熟悉,我认为我无法给出如何准确执行此操作的简短解释。我建议您使用 ANTLR 尝试一些简单的操作,并浏览权威的 ANTLR 参考 。它甚至有一段关于此类问题的段落,类似于解析 Python 代码。有关详细信息,请参阅4.3 规则一章的每个 Lexer 规则发出多个令牌段落。
ANTLR can surely be used for this. However, if you're new to ANTLR or parser-generators in general, I don't think I can give a short explanation of how to do this exactly. I recommend you try some simple things with ANTLR and browse through The Definitive ANTLR Reference. It even has a paragraph about this type of problem which is similar to parsing Python code. See Chapter 4.3 Rules, paragraph Emitting More Than One Token per Lexer Rule for details.
我的方法是让你的词法分析器生成缩进/减少缩进标记。存储当前的缩进级别并匹配
\n *
等模式。计算空格数,如果与当前缩进级别不同,则发出缩进/减少缩进标记。同样,在行首计算选项卡数。插入一条在
\n[ \t]*
模式上抛出错误的规则应该可以阻止人们混合制表符和空格。My approach would be to make your lexer generate indent/outdent tokens. Store the current indentation level and match a pattern like
\n *
. Count the number of spaces and if it is different to the current indentation level, emit an indent/outdent token.Similarly, count tabs at start-of-line. Inserting a rule that throws an error up on a pattern of
\n[ \t]*
should stop people mixing tabs and spaces.