正则表达式中的多重分组

发布于 2025-01-10 14:19:15 字数 523 浏览 3 评论 0原文

我有一个字符串

s="<response>blabla  
   <head> blabla 
      <t> EXTRACT 1</t>  
      <t>EXTRACT 2</t>  
   </head>

   <body> blabla   
      <t>BODY 1</t>
      <t>BODY 2</t>
 </response>"

，我需要提取标签之间的文本，但前提是它位于头部。我尝试了

regex="(?:([\w.,_]*)*)

re.findall(regex,s)

但它也在获取正文部分，我知道我需要告诉它在结束头标签处停止，但我想不出任何方法

PS：字符串在一行中，我将其拆分以提高可读性。我想做这使用正则表达式而不是 xml 解析器。

原文

I have a string

s="<response>blabla  
   <head> blabla 
      <t> EXTRACT 1</t>  
      <t>EXTRACT 2</t>  
   </head>

   <body> blabla   
      <t>BODY 1</t>
      <t>BODY 2</t>
 </response>"

I need to extract the text betwen the tags and but only if its in the head part.
I tried

regex="(?:<t>([\w.,_]*)*)</t>

re.findall(regex,s)

but it is fetching the body part too , i understand that i need to tell it to stop at the closing head tag but I couldnt come up with any way

PS:The string is in a single line, I split it for better readability.And i want to do this using regex and not xml parsers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云仙小弟 2025-01-17 14:19:15

您可以先找到标头：

s = "<response>blabla  <head> blabla <t> EXTRACT 1</t>  <t>EXTRACT 2</t>  </head> <body> blabla  <t>BODY 1</t> <t>BODY 2</t> </response>"
pattern_head = "<head>(.*)</head>" 
header = re.findall(pattern_head, s)
print(header)

这给出： [' blabla;摘录 1提取 2']
然后从头脑中得到你想要的东西：

pattern = "<t>(.*?)</t>"
substring = re.findall(pattern,header[0])
print(substring)

>>> ['提取1'，'提取2']

You can find the header first :

s = "<response>blabla  <head> blabla <t> EXTRACT 1</t>  <t>EXTRACT 2</t>  </head> <body> blabla  <t>BODY 1</t> <t>BODY 2</t> </response>"
pattern_head = "<head>(.*)</head>" 
header = re.findall(pattern_head, s)
print(header)

This gives : [' blabla <t> EXTRACT 1</t> <t>EXTRACT 2</t> ']
Then get what you want from the head :

pattern = "<t>(.*?)</t>"
substring = re.findall(pattern,header[0])
print(substring)

>>> [' EXTRACT 1', 'EXTRACT 2']

回复收藏 0 原文

云之铃。 2025-01-17 14:19:15

我从 @oriberu

regex=(\w+)( ?=.*?)

回复收藏 0 原文

~没有更多了~

关于作者

活泼老夫

暂无简介

文章

27 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

正则表达式中的多重分组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

正则表达式中的多重分组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。