scala 中递归解析器的高级控制
val uninterestingthings = ".".r
val parser = "(?ui)(regexvalue)".r | (uninterestingthings~>parser)
此递归解析器将尝试解析“(?ui)(regexvalue)”.r,直到输入结束。当某些定义数量的字符被“uninterestingthings”消耗时,scala是否有一种禁止解析的方法?
UPD:我有一个糟糕的解决方案:
object NonRecursiveParser extends RegexParsers with PackratParsers{
var max = -1
val maxInput2Consume = 25
def uninteresting:Regex ={
if(max<maxInput2Consume){
max+=1
("."+"{0,"+max.toString+"}").r
}else{
throw new Exception("I am tired")
}
}
lazy val value = "itt".r
def parser:Parser[Any] = (uninteresting~>value)|parser
def parseQuery(input:String) = {
try{
parse(parser, input)
}catch{
case e:Exception =>
}
}
}
缺点:
- 并非所有成员都是惰性值,因此 PackratParser 会有一些时间损失
- 在每个“无趣”的方法调用上构建正则表达式 - 时间损失
- 使用异常来控制程序 - 代码风格和时间损失
val uninterestingthings = ".".r
val parser = "(?ui)(regexvalue)".r | (uninterestingthings~>parser)
This recursive parser will try to parse "(?ui)(regexvalue)".r until the end of input. Is in scala a way to prohibit parsing when some defined number of characters were consumed by "uninterestingthings" ?
UPD: I have one poor solution:
object NonRecursiveParser extends RegexParsers with PackratParsers{
var max = -1
val maxInput2Consume = 25
def uninteresting:Regex ={
if(max<maxInput2Consume){
max+=1
("."+"{0,"+max.toString+"}").r
}else{
throw new Exception("I am tired")
}
}
lazy val value = "itt".r
def parser:Parser[Any] = (uninteresting~>value)|parser
def parseQuery(input:String) = {
try{
parse(parser, input)
}catch{
case e:Exception =>
}
}
}
Disadvantages:
- not all members are lazy vals so PackratParser will have some time penalty
- constructing regexps on every "uninteresting" method call - time penalty
- using exception to control program - code style and time penalty
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
快速而肮脏的答案是仅限制正则表达式中无趣事物的字符数,并使其不递归:
根据有关贪婪吞噬正则表达式值的评论,我建议使用单个正则表达式:
但我们似乎已经冒险超出了将 scala 解析器领域转化为正则表达式细节。我有兴趣看到其他结果。
The quick-n-dirty answer is to just limit the number of characters in your regex for uninterestingthings and make it not recursive:
Based on the comment about greediness eating the regexvalue, I propose a single regex:
But we seem to have ventured outside the realm of scala parsers into regex minutia. I'd be interested in seeing other results.
首先使用分词器将事情分解,使用所有正则表达式来处理您已经知道的有趣的事情。如果无趣的事物对您的语法很重要,请使用单个
".".r
来匹配它们。 (或者如果它们对语法不重要,则将它们扔掉。)您感兴趣的事物现在具有已知类型,并且标记生成器使用与解析不同的算法来识别它们。由于所有前瞻问题都由分词器解决,因此解析器应该很容易。Use a tokenizer to break things up first, using all of the regexps for interesting things that you already know. Use a single
".".r
to match uninteresting things if they're significant to your grammar. (Or throw them away if they're not significant to the grammar.) Your interesting things now have known types, and they get identified by the tokenizer using a different algorithm than the parsing. Since all of the lookahead problems are solved by the tokenizer, the parser should be easy.