在正则表达式 python 之间提取内容?
有没有一种简单的方法可以在正则表达式之间提取内容?假设我有以下示例文本
SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT
我的正则表达式是:
compiledRegex = re.compile('\[.*\] value=("|\').*("|\')')
这显然会返回整个 [SOME MORE TEXT] value="ssss",但是我只希望返回 ssss 因为这就是我正在寻找的
我显然可以定义一个解析器函数,但我觉得 python 提供了一些简单的 pythonic 方法来完成这样的任务
Is there a simple method to pull content between a regex? Assume I have the following sample text
SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT
My regex is:
compiledRegex = re.compile('\[.*\] value=("|\').*("|\')')
This will obviously return the entire [SOME MORE TEXT] value="ssss", however I only want ssss to be returned since that's what I'm looking for
I can obviously define a parser function but I feel as if python provides some simple pythonic way to do such a task
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这就是捕获组的设计目的。
旧组(括号)内的
?:
表示该组现在是一个非捕获组;也就是说,它在结果中不能作为一个组进行访问。我转换了它们以保持输出更简单,但是如果您愿意,您可以将它们保留为捕获组(但是您必须使用matches.group(2)
代替,因为第一个引号将是第一个捕获组)。This is what capturing groups are designed to do.
The
?:
inside the old groups (the parentheses) means that the group is now a non-capturing group; that is, it won't be accessible as a group in the result. I converted them to keep the output simpler, but you can leave them as capturing groups if you prefer (but then you have to usematches.group(2)
instead, since the first quote would be the first captured group).您原来的正则表达式太贪婪:
r'.*\]'
不会在第一个']'
和第二个'.*'
处停止code> 不会在'"'
处停止。要在c
处停止,您可以使用[^c]
或'.* ?'
:示例
Your original regex is too greedy:
r'.*\]'
won't stop at the first']'
and the second'.*'
won't stop at'"'
. To stop atc
you could use[^c]
or'.*?'
:Example