从 Scala 解析器组合器中过滤标记

发布于 2024-09-10 10:20:02 字数 769 浏览 4 评论 0原文

使用 Scala 解析器组合器时,如何过滤从词法分析器到解析器的标记序列?

让我解释一下 - 假设我有一个相当标准的词法分析器(扩展 StdLexical)和解析器(扩展 StdTokenParsers)模式。词法分析器将字符序列转换为标记序列,然后解析器将标记序列转换为抽象语法树(Expr 类型)。

我决定某些可能出现在流中任何位置的标记,我希望能够选择过滤掉,因此我想要一个适合词法分析器和解析器之间的函数来删除这些标记。例如,我可能希望词法分析器对注释进行标记,然后稍后过滤掉这些注释。

编写此过滤器的最佳方法是什么?这可以使用解析器组合器习惯用法,但不是必须的。

当前代码示例:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(tokens)

我希望能够编写如下内容:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(filter(tokens))

How do I filter the sequence of tokens coming from my Lexer to my Parser when using Scala parser combinators?

Let me explain - suppose I have the fairly standard pattern of a Lexer (extending StdLexical) and a Parser (extending StdTokenParsers). The lexer turns a sequence of characters to a sequence of tokens, then the parser turns the sequence of tokens to an abstract syntax tree (of type Expr).

I decide that some tokens, which could occur anywhere in the stream, I would like to have the option of filtering out, so I would like a function that would fit between the Lexer and Parser to remove these tokens. For example, I might want the lexer to tokenise comments, and then filter out these comments later.

What is the best way of writing this filter? This could use the parser combinator idiom, but doesn't have to.

Sample current code:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(tokens)

I would like to be able to write something like this:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(filter(tokens))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

负佳期 2024-09-17 10:20:02

我现在已经做到了,这是结果。关键的见解是解析器组合器中的解析器使用 scala.util.parsing.input.Reader 作为输入。因此,我们需要一个包装 Reader 的类,它本身就是一个 Reader ,它会在某些条件下过滤掉条目。

我编写了 Reader,因此在构造时它会跳过所有不需要的条目,并在第一个好的条目或结尾处停止。然后每个调用都委托给原始读者,除了
rest 依次构造另一个TokenFilter。

import scala.util.parsing.input._

class Filter[T](parent : Reader[T], exclude : T=>Boolean) extends Reader[T] {
  private val start = nextOk(parent)
  def nextOk(r : Reader[T]) : Reader[T] =
    if(r.atEnd) r else (if (exclude(r.first)) nextOk(r.rest) else r)

  override def source = start.source
  override def offset: Int = start.offset
  override def first: T = start.first
  override def rest: Reader[T] = new Filter(start.rest, exclude)
  override def pos: Position = start.pos
  override def atEnd = start.atEnd
}

I've done it now, here are the results. The key insight is that a Parser from the parser combinator uses a scala.util.parsing.input.Reader as input. So we need a class that wraps a Reader, and itself is a Reader which filters out entries on some condition.

I write the Reader so on construction it skips all unwanted entries and stops at either the first good entry or the end. Then every call is delegated to the original reader except for
rest which constructs another TokenFilter in turn.

import scala.util.parsing.input._

class Filter[T](parent : Reader[T], exclude : T=>Boolean) extends Reader[T] {
  private val start = nextOk(parent)
  def nextOk(r : Reader[T]) : Reader[T] =
    if(r.atEnd) r else (if (exclude(r.first)) nextOk(r.rest) else r)

  override def source = start.source
  override def offset: Int = start.offset
  override def first: T = start.first
  override def rest: Reader[T] = new Filter(start.rest, exclude)
  override def pos: Position = start.pos
  override def atEnd = start.atEnd
}
治碍 2024-09-17 10:20:02

您是否考虑过使用 RegexParsers 来删除空格和注释?

编辑

您可以创建一个简单的过滤器

import scala.util.parsing.input._

object ReaderFilter {
  def filter[T](reader: Reader[T], check: T => Boolean): Reader[T] = {
    new Reader[T] {
      var orig = reader
      def first = { trim; orig.first }
      def atEnd = { trim; orig.atEnd }
      def rest: Reader[T] = { trim; ReaderFilter.filter(orig.rest, check) }
      def pos = orig.pos
      private def trim = {
        while (!orig.atEnd && !check(orig.first))
          orig = orig.rest
      }
    }
  }
}

并以这种方式使用它(删除“#”标记):

val tokens = ReaderFilter.filter(new MyParser.lexical.Scanner(reader), 
          {t:ExprParser.lexical.Token => t.chars != "#"})

Have you considered using a RegexParsers to remove whitespace and comments?

EDIT

You can make a simple filter

import scala.util.parsing.input._

object ReaderFilter {
  def filter[T](reader: Reader[T], check: T => Boolean): Reader[T] = {
    new Reader[T] {
      var orig = reader
      def first = { trim; orig.first }
      def atEnd = { trim; orig.atEnd }
      def rest: Reader[T] = { trim; ReaderFilter.filter(orig.rest, check) }
      def pos = orig.pos
      private def trim = {
        while (!orig.atEnd && !check(orig.first))
          orig = orig.rest
      }
    }
  }
}

and use it in this manner (to remove tokens that are "#"):

val tokens = ReaderFilter.filter(new MyParser.lexical.Scanner(reader), 
          {t:ExprParser.lexical.Token => t.chars != "#"})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文