在快乐/无限循环中嵌套解析器?
我正在尝试为简单的标记语言编写一个解析器,很高兴。目前,我遇到了无限循环和嵌套元素的一些问题。
我的标记语言基本上由两个元素组成,一个用于“普通”文本,一个用于粗体/强调文本。
data Markup
= MarkupText String
| MarkupEmph [Markup]
例如,像 Foo *bar*
这样的文本应该被解析为 [MarkupText "Foo ", MarkupEmph [MarkupText "bar"]]
。
该示例的词法分析工作正常,但解析它会导致无限循环 - 我不明白为什么。这是我目前的方法:
-- The main parser: Parsing a list of "Markup"
Markups :: { [Markup] }
: Markups Markup { $1 ++ [$2] }
| Markup { [$1] }
-- One single markup element
Markup :: { Markup }
: '*' Markups1 '*' { MarkupEmph $2 }
| Markup1 { $1 }
-- The nested list inside *..*
Markups1 :: { [Markup] }
: Markups1 Markup1 { $1 ++ [$2] }
| Markup1 { [$1] }
-- Markup which is always available:
Markup1 :: { Markup }
: String { MarkupText $1 }
这种方法有什么问题?怎么解决呢?
更新:抱歉。 Lexing 没有按预期工作。无限循环位于词法分析器内部。对不起。 :)
更新 2: 根据请求,我将其用作词法分析器:
lexer :: String -> [Token]
lexer [] = []
lexer str@(c:cs)
| c == '*' = TokenSymbol "*" : lexer cs
-- ...more rules...
| otherwise = TokenString val : lexer rest
where (val, rest) = span isValidChar str
isValidChar = (/= '*')
发生无限递归是因为我使用 lexer str
而不是 lexer cs
在 '*'
的第一条规则中。没有看到它,因为我的实际代码有点复杂。 :)
I'm trying to write a parser for a simple markup language with happy. Currently, I'm having some issues with infinit loops and nested elements.
My markup language basicly consists of two elements, one for "normal" text and one for bold/emphasized text.
data Markup
= MarkupText String
| MarkupEmph [Markup]
For example, a text like Foo *bar*
should get parsed as [MarkupText "Foo ", MarkupEmph [MarkupText "bar"]]
.
Lexing of that example works fine, but the parsing it results in an infinite loop - and I can't see why. This is my current approach:
-- The main parser: Parsing a list of "Markup"
Markups :: { [Markup] }
: Markups Markup { $1 ++ [$2] }
| Markup { [$1] }
-- One single markup element
Markup :: { Markup }
: '*' Markups1 '*' { MarkupEmph $2 }
| Markup1 { $1 }
-- The nested list inside *..*
Markups1 :: { [Markup] }
: Markups1 Markup1 { $1 ++ [$2] }
| Markup1 { [$1] }
-- Markup which is always available:
Markup1 :: { Markup }
: String { MarkupText $1 }
What's wrong with that approach? How could the be resolved?
Update: Sorry. Lexing wasn't working as expected. The infinit loop was inside the lexer. Sorry. :)
Update 2: On request, I'm using this as lexer:
lexer :: String -> [Token]
lexer [] = []
lexer str@(c:cs)
| c == '*' = TokenSymbol "*" : lexer cs
-- ...more rules...
| otherwise = TokenString val : lexer rest
where (val, rest) = span isValidChar str
isValidChar = (/= '*')
The infinit recursion occured because I had lexer str
instead of lexer cs
in that first rule for '*'
. Didn't see it because my actual code was a bit more complex. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是一个警告,自从我处理解析器生成器以来已经有一段时间了。
看起来你需要一个 LR(1) 解析器,但我不确定 Happy 是不是。我确信一旦我写下这篇文章,就会有人能够纠正我。
如果您的解析器无法向前查看,它将永远停留在该语句上。
它将查找 Markups1,而 Markups1 又会查找 Markups1。我能猜测到的最好的情况是,它不会对 Markup1 进行前瞻性检查以查看它是否是字符串。
尝试像这样重写它
本质上你希望它首先找到该字符串,然后尝试查找另一个字符串,如果没有找到它需要结束该语句。
Just a warning it has been a while since I've dealt with parser generators.
It would appear you need a LR(1) parser which im not sure Happy is. I am positive once I write this someone will be able to correct me.
If your parser can't look ahead it will be stuck on this statement forever
It will look for a Markups1, which in turn looks for a Markups1. Best I can guess, it isnt peforming a look ahead to Markup1 to see if it is a string.
Try rewriting it like this
Essentially you want it to find the string first, then try to look for another string, if it doesn't find one it needs to end that statement.