pyparsing - 解析 xml 注释
我需要解析一个包含 xml 注释的文件。具体来说,它是使用 MS ///
约定的 ac# 文件。
由此我需要取出 foobar
,或者 /// foobar
也是可以接受的。 (注意 - 如果您将 xml 全部放在一行上,这仍然不起作用...)
testStr = """
///<summary>
/// foobar
///</summary>
"""
这是我所拥有的:
import pyparsing as pp
_eol = pp.Literal("\n").suppress()
_cPoundOpenXmlComment = Suppress('///<summary>') + pp.SkipTo(_eol)
_cPoundCloseXmlComment = Suppress('///</summary>') + pp.SkipTo(_eol)
_xmlCommentTxt = ~_cPoundCloseXmlComment + pp.SkipTo(_eol)
xmlComment = _cPoundOpenXmlComment + pp.OneOrMore(_xmlCommentTxt) + _cPoundCloseXmlComment
match = xmlComment.scanString(testStr)
和输出:
for item,start,stop in match:
for entry in item:
print(entry)
但我在跨多行工作的语法方面还没有取得太大成功。
(注意 - 我在 python 3.2 中测试了上述示例;它可以工作,但(根据我的问题)不打印任何值)
谢谢!
I need to parse a file containing xml comments. Specifically it's a c# file using the MS ///
convention.
From this I'd need to pull out foobar
, or /// foobar
would be acceptable, too. (Note - this still doesn't work if you make the xml all on one line...)
testStr = """
///<summary>
/// foobar
///</summary>
"""
Here is what I have:
import pyparsing as pp
_eol = pp.Literal("\n").suppress()
_cPoundOpenXmlComment = Suppress('///<summary>') + pp.SkipTo(_eol)
_cPoundCloseXmlComment = Suppress('///</summary>') + pp.SkipTo(_eol)
_xmlCommentTxt = ~_cPoundCloseXmlComment + pp.SkipTo(_eol)
xmlComment = _cPoundOpenXmlComment + pp.OneOrMore(_xmlCommentTxt) + _cPoundCloseXmlComment
match = xmlComment.scanString(testStr)
and to output:
for item,start,stop in match:
for entry in item:
print(entry)
But I haven't had much success with the grammer working across multi-line.
(note - I tested the above sample in python 3.2; it works but (per my issue) does not print any values)
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为
Literal('\n')
是你的问题。您不想构建带有空格字符的字面量(因为默认情况下,字面量在尝试匹配之前会跳过空格)。尝试使用LineEnd()
代替。编辑1:
仅仅因为 LineEnd 出现无限循环并不意味着 Literal('\n') 更好。尝试在
_eol
定义末尾添加.setDebug()
,您会发现它永远不会匹配任何内容。与其尝试将评论正文定义为“不是结束行的一行或多行,而是将所有内容都放到行尾”,如果您这样做:(
您获得无限的原因与 LineEnd() 循环的区别是,您本质上是在执行 OneOrMore(SkipTo(LineEnd())),但从不消耗 LineEnd(),因此 OneOrMore 只是不断匹配、匹配、匹配,解析并返回一个空字符串,因为解析位置曾是在行尾。)
I think
Literal('\n')
is your problem. You don't want to build a Literal with whitespace characters (since Literals by default skip over whitespace before trying to match). Try usingLineEnd()
instead.EDIT 1:
Just because you get an infinite loop with LineEnd doesn't mean that Literal('\n') is any better. Try adding
.setDebug()
on the end of your_eol
definition, and you'll see that it never matches anything.Instead of trying to define the body of your comment as "one or more lines that are not a closing line, but get everything up to the end-of-line", what if you just do:
(The reason you were getting an infinite loop with LineEnd() was that you were essentially doing OneOrMore(SkipTo(LineEnd())), but never consuming the LineEnd(), so the OneOrMore just kept matching and matching and matching, parsing and returning an empty string since the parsing position was at the end of line.)
使用
nestedExpr
怎么样:How about using
nestedExpr
:您可以使用 xml 解析器来解析 xml。提取相关注释行应该很容易:
You could use an xml parser to parse xml. It should be easy to extract relevant comment lines: