Haskell - 秒差距解析
元素
我正在使用 文本。 ParserCombinators.Parsec 和 Text.XHtml 解析这样的输入:
This is the first paragraph example\n with two lines\n \n And this is the second paragraph\n
我的输出应该是:
这是第一段示例\n
有两行\n 这是第二段\n
我定义:
line= do{
;t<-manyTill (anyChar) newline
;return t
}
paragraph = do{
t<-many1 (line)
;return ( p << t )
}
但它返回:
这是第一段示例\n
有两行\n\n这是第二段\n
出了什么问题?有什么想法吗?
谢谢!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
来自 文档ManyTill,它会运行第一个参数零次或多次,因此连续 2 个换行符仍然有效,并且您的
line
解析器不会失败。您可能正在寻找类似
many1Till
的东西(例如many1
与many
),但它似乎不存在于 Parsec 库中,所以你可能需要自己推出:(警告:我这台机器上没有 ghc,所以这完全未经测试)或更简洁的方式:
From documentation for manyTill, it runs the first argument zero or more times, so 2 newlines in a row is still valid and your
line
parser will not fail.You're probably looking for something like
many1Till
(likemany1
versusmany
) but it doesn't seem to exist in the Parsec library, so you may need to roll your own: (warning: I don't have ghc on this machine, so this is completely untested)or a terser way:
manyTill
组合器 匹配第一个参数的零次或多次出现,因此line
会很乐意接受空行,这意味着many1 line
将消耗文件中直到最后一个换行符的所有内容,而不是像您预期的那样停在双换行符处。The
manyTill
combinator matches zero or more occurrences of its first argument, according to the documentation, soline
will happily accept a blank line, which means thatmany1 line
will consume everything up to the final newline in the file, rather than stopping at a double newline as it seems you intended.