秒差距 - 错误“组合符‘许多’”应用于接受空字符串的解析器”
我正在尝试使用 Parsec 编写一个解析器来解析有文字的 Haskell 文件,如下所示:
The classic 'Hello, world' program.
\begin{code}
main = putStrLn "Hello, world"
\end{code}
More text.
我写了以下内容,有点受到 RWH 中示例的启发:
import Text.ParserCombinators.Parsec
main
= do contents <- readFile "hello.lhs"
let results = parseLiterate contents
print results
data Element
= Text String
| Haskell String
deriving (Show)
parseLiterate :: String -> Either ParseError [Element]
parseLiterate input
= parse literateFile "(unknown)" input
literateFile
= many codeOrProse
codeOrProse
= code <|> prose
code
= do eol
string "\\begin{code}"
eol
content <- many anyChar
eol
string "\\end{code}"
eol
return $ Haskell content
prose
= do content <- many anyChar
return $ Text content
eol
= try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
我希望这会产生类似的结果of:(
[Text "The classic 'Hello, world' program.", Haskell "main = putStrLn \"Hello, world\"", Text "More text."]
允许空白等)。
这编译得很好,但是当运行时,我收到错误:
*** Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string
任何人都可以阐明这一点,并可能帮助提供解决方案吗?
I'm trying to write a parser using Parsec that will parse literate Haskell files, such as the following:
The classic 'Hello, world' program.
\begin{code}
main = putStrLn "Hello, world"
\end{code}
More text.
I've written the following, sort-of-inspired by the examples in RWH:
import Text.ParserCombinators.Parsec
main
= do contents <- readFile "hello.lhs"
let results = parseLiterate contents
print results
data Element
= Text String
| Haskell String
deriving (Show)
parseLiterate :: String -> Either ParseError [Element]
parseLiterate input
= parse literateFile "(unknown)" input
literateFile
= many codeOrProse
codeOrProse
= code <|> prose
code
= do eol
string "\\begin{code}"
eol
content <- many anyChar
eol
string "\\end{code}"
eol
return $ Haskell content
prose
= do content <- many anyChar
return $ Text content
eol
= try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
Which I hoped would result in something along the lines of:
[Text "The classic 'Hello, world' program.", Haskell "main = putStrLn \"Hello, world\"", Text "More text."]
(allowing for whitespace etc).
This compiles fine, but when run, I get the error:
*** Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string
Can anyone shed any light on this, and possibly help with a solution please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正如 sth 所指出的,
many anyChar
是问题所在。但不仅在散文中,而且在代码中也是如此。code
的问题是,content <- 许多anyChar
将消耗所有内容:换行符和\end{code}
标记。因此,您需要有某种方法来区分散文和代码。一个简单(但可能太天真)的方法是查找反斜杠:
现在,您还没有完全获得所需的结果,因为
Haskell
部分也将包含换行符,但您可以很容易过滤掉这些(给定一个函数filterNewlines
你可以说`content <- filterNewlines <$> (many $ noneOf "\\")
)。编辑
好的,我想我找到了一个解决方案(需要最新的 Parsec 版本,因为
lookAhead
):untilP p
解析一行,然后检查如果p
可以成功解析下一行的开头。如果是,则返回空字符串,否则继续。lookAhead
是必需的,因为否则 begin\end-tags 将被消耗并且code
无法识别它们。我想它仍然可以变得更简洁(即不必在
code
内重复string "\\end{code}\n"
)。As sth pointed out
many anyChar
is the problem. But not just inprose
but also incode
. The problem withcode
is, thatcontent <- many anyChar
will consume everything: The newlines and the\end{code}
tag.So, you need to have some way to tell the prose and the code apart. An easy (but maybe too naive) way to do so, is to look for backslashes:
Now, you don't completely have the desired result, because the
Haskell
part will also contain newlines, but you can filter these out quite easily (given a functionfilterNewlines
you could say`content <- filterNewlines <$> (many $ noneOf "\\")
).Edit
Okay, I think I found a solution (requires the newest Parsec version, because of
lookAhead
):untilP p
parses a line, then checks if the beginning of the next line can be successfully parsed byp
. If so, it returns the empty string, otherwise it goes on. ThelookAhead
is needed, because otherwise the begin\end-tags would be consumed andcode
couldn't recognize them.I guess it could still be made more concise (i.e. not having to repeat
string "\\end{code}\n"
insidecode
).我还没有测试过,但是:
many anyChar
可以匹配空字符串prose
可以匹配空字符串codeOrProse
可以匹配空字符串string更改 prose 以匹配 Many1 字符可能会解决此问题。
(我对 Parsec 不太熟悉,但是
prose
如何知道它应该匹配多少 个字符?它可能会消耗整个输入,而不会给出code
解析器有第二次机会寻找新代码段的开头,或者它可能在每次调用中只匹配一个字符,从而使many
/many1
中。没啥用。)I haven't tested it, but:
many anyChar
can match an empty stringprose
can match an empty stringcodeOrProse
can match an empty stringliterateFile
can loop forever, matching infinitely many empty stringsChanging
prose
to matchmany1
characters might fix this problem.(I'm not very familiar with Parsec, but how will
prose
know how many characters it should match? It might consume the whole input, never giving thecode
parser a second chance to look for the start of a new code segment. Alternatively it might only match one character in each call, making themany
/many1
in it useless.)作为参考,这是我提出的另一个版本(稍微扩展以处理其他情况):
For reference, here's another version I came up with (slightly expanded to handle other cases):