为什么秒差距是“选择”组合器似乎停留在第一个选择上?
在查看 Real World Haskell 中的 CSV 示例代码后,我尝试构建一个小型 XML 解析器。但关闭标签会出现“意外的“/””错误。你能告诉我为什么我的“closeTag”解析器不起作用(或者可能从未被调用)吗?谢谢!
import Text.ParserCombinators.Parsec
xmlFile = manyTill line eof
line = manyTill tag eol
eol = char '\n'
word = many1 (noneOf "></")
tag = choice [openTag, closeTag, nullTag, word]
nullTag = between (char '<') (string "/>") word
closeTag = between (string "</") (char '>') word
openTag = between (char '<') (char '>') tagContent
attrval = between (char '"') (char '"') word
atts = do {
(char ' ')
; sepBy attr (char ' ')
}
attr = do {
word
; char '='
; attrval
}
tagContent = do {
w <- word
; option [] atts
; return w
}
parseXML :: String -> Either ParseError [[String]]
parseXML input = parse xmlFile "(unknown)" input
main =
do c <- getContents
case parse xmlFile "(stdin)" c of
Left e -> do putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
After looking at the CSV sample code in Real World Haskell, I've tried to build a little XML parser. But close tags error out with 'unexpected "/"' errors. Can you tell me why my "closeTag" parser doesn't work (or possibly isn't ever invoked)? Thanks!
import Text.ParserCombinators.Parsec
xmlFile = manyTill line eof
line = manyTill tag eol
eol = char '\n'
word = many1 (noneOf "></")
tag = choice [openTag, closeTag, nullTag, word]
nullTag = between (char '<') (string "/>") word
closeTag = between (string "</") (char '>') word
openTag = between (char '<') (char '>') tagContent
attrval = between (char '"') (char '"') word
atts = do {
(char ' ')
; sepBy attr (char ' ')
}
attr = do {
word
; char '='
; attrval
}
tagContent = do {
w <- word
; option [] atts
; return w
}
parseXML :: String -> Either ParseError [[String]]
parseXML input = parse xmlFile "(unknown)" input
main =
do c <- getContents
case parse xmlFile "(stdin)" c of
Left e -> do putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Parsec 的策略本质上是 LL(1),这意味着只要消耗任何输入,它就会“提交”到当前分支。您的
openTag
解析器使用<
及其char '<'
,这意味着如果它看到>
code> 而不是/
,整个解析失败而不是尝试新的选择。如果openTag
没有消耗任何输入并且失败,则会尝试另一种选择。秒差距这样做是为了提高效率(替代方案是指数时间!)和合理的错误消息。你有两个选择。当合理的情况下,首选的选择是考虑你的语法,以便在不消耗输入的情况下做出所有选择,例如:
模数错误和风格(我的大脑现在有点烧焦:-P)。
另一种方法是使用
try
组合器,它允许解析器在本地更改 Parsec 的语义(以牺牲上述错误消息和效率为代价 - 但通常不会太糟糕,因为它是本地的)消耗输入但仍然“温和”失败,因此可以尝试另一种选择:有时使用 try 比上面的因式分解更干净、更容易,这可能会掩盖语言的“深层结构”。这是一种风格上的权衡。
Parsec's strategy is essentially LL(1), which means that it "commits" to the current branch whenever any input is consumed. Your
openTag
parser consumes the<
with itschar '<'
, which means that if when it sees>
instead of/
, the whole parse fails instead of trying a new choice. IfopenTag
didn't consume any input and failed, another choice would be tried. Parsec does this for efficiency (the alternative is exponential time!) and for reasonable error messages.You have two options. The preferred option, when it is reasonable to pull off, is to factor your grammar so that all choices are made without consuming input, eg.:
Modulo errors and style (my brain is a bit fried at the moment :-P).
The other way, which locally changes parsec's semantics (at the expense of the aforementioned error messages and efficiency -- but it's not usually too bad because it's local), is to use the
try
combinator which allows a parser to consume input and still fail "softly" so another choice can be tried:Sometimes using try is cleaner and easier than factoring like above, which can obscure the "deep structure" of the language. It's a stylistic trade-off.