为什么秒差距是“选择”组合器似乎停留在第一个选择上？

发布于 2024-11-01 19:23:51 字数 1147 浏览 0 评论 0原文

在查看 Real World Haskell 中的 CSV 示例代码后，我尝试构建一个小型 XML 解析器。但关闭标签会出现“意外的“/””错误。你能告诉我为什么我的“closeTag”解析器不起作用（或者可能从未被调用）吗？谢谢！

import Text.ParserCombinators.Parsec

xmlFile = manyTill line eof
line = manyTill tag eol
eol = char '\n'

word = many1 (noneOf "></")

tag = choice [openTag, closeTag, nullTag, word]

nullTag = between (char '<') (string "/>") word
closeTag = between (string "</") (char '>') word
openTag = between (char '<') (char '>')  tagContent
attrval = between (char '"') (char '"') word

atts = do {
        (char ' ')
        ; sepBy attr (char ' ')
}

attr = do {
                word
                ; char '='
                ; attrval
        }

tagContent = do {
                w <- word
                ; option []  atts
                ; return w
        }

parseXML :: String -> Either ParseError [[String]]
parseXML input = parse xmlFile "(unknown)" input

main =
    do c <- getContents
       case parse xmlFile "(stdin)" c of
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> mapM_ print r

原文

After looking at the CSV sample code in Real World Haskell, I've tried to build a little XML parser. But close tags error out with 'unexpected "/"' errors. Can you tell me why my "closeTag" parser doesn't work (or possibly isn't ever invoked)? Thanks!

import Text.ParserCombinators.Parsec

xmlFile = manyTill line eof
line = manyTill tag eol
eol = char '\n'

word = many1 (noneOf "></")

tag = choice [openTag, closeTag, nullTag, word]

nullTag = between (char '<') (string "/>") word
closeTag = between (string "</") (char '>') word
openTag = between (char '<') (char '>')  tagContent
attrval = between (char '"') (char '"') word

atts = do {
        (char ' ')
        ; sepBy attr (char ' ')
}

attr = do {
                word
                ; char '='
                ; attrval
        }

tagContent = do {
                w <- word
                ; option []  atts
                ; return w
        }

parseXML :: String -> Either ParseError [[String]]
parseXML input = parse xmlFile "(unknown)" input

main =
    do c <- getContents
       case parse xmlFile "(stdin)" c of
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> mapM_ print r

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薄凉少年不暖心 2024-11-08 19:23:51

Parsec 的策略本质上是 LL(1)，这意味着只要消耗任何输入，它就会“提交”到当前分支。您的 openTag 解析器使用 < 及其 char '<'，这意味着如果它看到 > code> 而不是 /，整个解析失败而不是尝试新的选择。如果 openTag 没有消耗任何输入并且失败，则会尝试另一种选择。秒差距这样做是为了提高效率（替代方案是指数时间！）和合理的错误消息。

你有两个选择。当合理的情况下，首选的选择是考虑你的语法，以便在不消耗输入的情况下做出所有选择，例如：

tag = word <|> (char '<' >> tagbody)
    where
    tagbody = do
        content <- tagcontent
        choice [ string "/>", char '>' ]

模数错误和风格（我的大脑现在有点烧焦：-P）。

另一种方法是使用 try 组合器，它允许解析器在本地更改 Parsec 的语义（以牺牲上述错误消息和效率为代价 - 但通常不会太糟糕，因为它是本地的）消耗输入但仍然“温和”失败，因此可以尝试另一种选择：

nulltag = try $ between (char '<') (string "/>") word
-- etc.

有时使用 try 比上面的因式分解更干净、更容易，这可能会掩盖语言的“深层结构”。这是一种风格上的权衡。

Parsec's strategy is essentially LL(1), which means that it "commits" to the current branch whenever any input is consumed. Your openTag parser consumes the < with its char '<', which means that if when it sees > instead of /, the whole parse fails instead of trying a new choice. If openTag didn't consume any input and failed, another choice would be tried. Parsec does this for efficiency (the alternative is exponential time!) and for reasonable error messages.

You have two options. The preferred option, when it is reasonable to pull off, is to factor your grammar so that all choices are made without consuming input, eg.:

tag = word <|> (char '<' >> tagbody)
    where
    tagbody = do
        content <- tagcontent
        choice [ string "/>", char '>' ]

Modulo errors and style (my brain is a bit fried at the moment :-P).

The other way, which locally changes parsec's semantics (at the expense of the aforementioned error messages and efficiency -- but it's not usually too bad because it's local), is to use the try combinator which allows a parser to consume input and still fail "softly" so another choice can be tried:

nulltag = try $ between (char '<') (string "/>") word
-- etc.

Sometimes using try is cleaner and easier than factoring like above, which can obscure the "deep structure" of the language. It's a stylistic trade-off.

回复收藏 0 原文

~没有更多了~