如何确保我始终从 Python 的正则表达式中获得匹配列表?
我试图从 jsp 页面(格式错误的 xml)中提取一些信息(不需要递归),类似于:
<td>
<html:button ...></html:button>
<html:submit ...></html:submit></td>
正则表达式:
<html:(button|submit|cancel)[\s\S]*?</html:(button|submit|cancel)>
re.findall() 给了我一个元组列表,如下所示:
[('button','button'),('button','button')]
我从文档是正确的,但我希望得到更多类似的内容:
["<html:button ...>","<html:button ...>"]
获得我期望的结果的适当方法是什么?
I'm trying to pull some information (no recursion necessary) from a jsp page (malformed xml) similar to this:
<td>
<html:button ...></html:button>
<html:submit ...></html:submit></td>
And a regex:
<html:(button|submit|cancel)[\s\S]*?</html:(button|submit|cancel)>
re.findall() is giving me a list of tuples, like so:
[('button','button'),('button','button')]
Which I understand from the documentation is correct, but I'm looking to get something more like:
["<html:button ...>","<html:button ...>"]
What is the appropriate way to get the outcome I expect?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除了正则表达式可能不是您想要执行此操作的事实之外,您还希望使用括号将所需的位放入组中。如果您想要结束
标记之前的所有内容,那么您需要这样的内容:
如果您只想要
位,使用:例如
从
您获取:
如果您想从上面获取
foobar
,请使用:获取:
请注意,通常不可能匹配开始和结束标记(请注意<在上面的示例中,code>打开,
关闭)。如果您需要这样做,请使用适当的解析器。
Aside from the fact that a regex probably isn't what you want to do this with, you want to put the bit you want in groups using parentheses. If you want everything up to the closing
</html:whatever>
tag, then you want something like this:If you just want the
<html:button>
bit, use:e.g.
from
you get:
If you want to get the
foobar
from above, use:to get:
Note that it is not, in general, possible to match opening and closing tags (note that
<html:button>
is opened, and</html:submit>
closes in the example above). If you need to do that, use a proper parser.您的
(button|submit|cancel)
正在被捕获,因此请在方括号中添加?:
,例如(?:
Your
(button|submit|cancel)
getting capture, so add?:
in brackets like(?: