从 Scala 解析器组合器中过滤标记
使用 Scala 解析器组合器时,如何过滤从词法分析器到解析器的标记序列?
让我解释一下 - 假设我有一个相当标准的词法分析器(扩展 StdLexical
)和解析器(扩展 StdTokenParsers
)模式。词法分析器将字符序列转换为标记序列,然后解析器将标记序列转换为抽象语法树(Expr
类型)。
我决定某些可能出现在流中任何位置的标记,我希望能够选择过滤掉,因此我想要一个适合词法分析器和解析器之间的函数来删除这些标记。例如,我可能希望词法分析器对注释进行标记,然后稍后过滤掉这些注释。
编写此过滤器的最佳方法是什么?这可以使用解析器组合器习惯用法,但不是必须的。
当前代码示例:
val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
val tokens = new MyParser.lexical.Scanner(reader)
val parse = MyParser.phrase(parser)(tokens)
我希望能够编写如下内容:
val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
val tokens = new MyParser.lexical.Scanner(reader)
val parse = MyParser.phrase(parser)(filter(tokens))
How do I filter the sequence of tokens coming from my Lexer to my Parser when using Scala parser combinators?
Let me explain - suppose I have the fairly standard pattern of a Lexer (extending StdLexical
) and a Parser (extending StdTokenParsers
). The lexer turns a sequence of characters to a sequence of tokens, then the parser turns the sequence of tokens to an abstract syntax tree (of type Expr
).
I decide that some tokens, which could occur anywhere in the stream, I would like to have the option of filtering out, so I would like a function that would fit between the Lexer and Parser to remove these tokens. For example, I might want the lexer to tokenise comments, and then filter out these comments later.
What is the best way of writing this filter? This could use the parser combinator idiom, but doesn't have to.
Sample current code:
val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
val tokens = new MyParser.lexical.Scanner(reader)
val parse = MyParser.phrase(parser)(tokens)
I would like to be able to write something like this:
val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
val tokens = new MyParser.lexical.Scanner(reader)
val parse = MyParser.phrase(parser)(filter(tokens))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我现在已经做到了,这是结果。关键的见解是解析器组合器中的解析器使用 scala.util.parsing.input.Reader 作为输入。因此,我们需要一个包装
Reader
的类,它本身就是一个Reader
,它会在某些条件下过滤掉条目。我编写了 Reader,因此在构造时它会跳过所有不需要的条目,并在第一个好的条目或结尾处停止。然后每个调用都委托给原始读者,除了
rest
依次构造另一个TokenFilter。I've done it now, here are the results. The key insight is that a Parser from the parser combinator uses a
scala.util.parsing.input.Reader
as input. So we need a class that wraps aReader
, and itself is aReader
which filters out entries on some condition.I write the
Reader
so on construction it skips all unwanted entries and stops at either the first good entry or the end. Then every call is delegated to the original reader except forrest
which constructs another TokenFilter in turn.您是否考虑过使用 RegexParsers 来删除空格和注释?
编辑
您可以创建一个简单的过滤器
并以这种方式使用它(删除“#”标记):
Have you considered using a RegexParsers to remove whitespace and comments?
EDIT
You can make a simple filter
and use it in this manner (to remove tokens that are "#"):