scala 中递归解析器的高级控制

发布于 2024-09-01 09:46:51 字数 901 浏览 11 评论 0原文

val uninterestingthings = ".".r
val parser = "(?ui)(regexvalue)".r | (uninterestingthings~>parser)

此递归解析器将尝试解析“(?ui)(regexvalue)”.r,直到输入结束。当某些定义数量的字符被“uninterestingthings”消耗时,scala是否有一种禁止解析的方法?

UPD:我有一个糟糕的解决方案:

object NonRecursiveParser extends RegexParsers with PackratParsers{
  var max = -1
  val maxInput2Consume = 25
  def uninteresting:Regex ={
    if(max<maxInput2Consume){
    max+=1
    ("."+"{0,"+max.toString+"}").r
    }else{
      throw new Exception("I am tired")
    }
  }
  lazy val value = "itt".r
  def parser:Parser[Any] = (uninteresting~>value)|parser
  def parseQuery(input:String) = {
      try{
      parse(parser, input)
      }catch{
          case e:Exception => 
      }
  }
}

缺点:
- 并非所有成员都是惰性值,因此 PackratParser 会有一些时间损失
- 在每个“无趣”的方法调用上构建正则表达式 - 时间损失
- 使用异常来控制程序 - 代码风格和时间损失

val uninterestingthings = ".".r
val parser = "(?ui)(regexvalue)".r | (uninterestingthings~>parser)

This recursive parser will try to parse "(?ui)(regexvalue)".r until the end of input. Is in scala a way to prohibit parsing when some defined number of characters were consumed by "uninterestingthings" ?

UPD: I have one poor solution:

object NonRecursiveParser extends RegexParsers with PackratParsers{
  var max = -1
  val maxInput2Consume = 25
  def uninteresting:Regex ={
    if(max<maxInput2Consume){
    max+=1
    ("."+"{0,"+max.toString+"}").r
    }else{
      throw new Exception("I am tired")
    }
  }
  lazy val value = "itt".r
  def parser:Parser[Any] = (uninteresting~>value)|parser
  def parseQuery(input:String) = {
      try{
      parse(parser, input)
      }catch{
          case e:Exception => 
      }
  }
}

Disadvantages:
- not all members are lazy vals so PackratParser will have some time penalty
- constructing regexps on every "uninteresting" method call - time penalty
- using exception to control program - code style and time penalty

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不念旧人 2024-09-08 09:46:51

快速而肮脏的答案是仅限制正则表达式中无趣事物的字符数,并使其不递归:

val uninterestingthings = ".{0,60}".r  // 60-chars max
val parser = (uninterestingthings~>"(?ui)(regexvalue)".r)*

根据有关贪婪吞噬正则表达式值的评论,我建议使用单个正则表达式:

val parser = ("(?.{0,60}?)(?ui)(regexvalue)".r)*

但我们似乎已经冒险超出了将 scala 解析器领域转化为正则表达式细节。我有兴趣看到其他结果。

The quick-n-dirty answer is to just limit the number of characters in your regex for uninterestingthings and make it not recursive:

val uninterestingthings = ".{0,60}".r  // 60-chars max
val parser = (uninterestingthings~>"(?ui)(regexvalue)".r)*

Based on the comment about greediness eating the regexvalue, I propose a single regex:

val parser = ("(?.{0,60}?)(?ui)(regexvalue)".r)*

But we seem to have ventured outside the realm of scala parsers into regex minutia. I'd be interested in seeing other results.

把人绕傻吧 2024-09-08 09:46:51

首先使用分词器将事情分解,使用所有正则表达式来处理您已经知道的有趣的事情。如果无趣的事物对您的语法很重要,请使用单个 ".".r 来匹配它们。 (或者如果它们对语法不重要,则将它们扔掉。)您感兴趣的事物现在具有已知类型,并且标记生成器使用与解析不同的算法来识别它们。由于所有前瞻问题都由分词器解决,因此解析器应该很容易。

Use a tokenizer to break things up first, using all of the regexps for interesting things that you already know. Use a single ".".r to match uninteresting things if they're significant to your grammar. (Or throw them away if they're not significant to the grammar.) Your interesting things now have known types, and they get identified by the tokenizer using a different algorithm than the parsing. Since all of the lookahead problems are solved by the tokenizer, the parser should be easy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文