python中区分大小写的字符串匹配替换
我对 python 很陌生尝试做一些新的东西。我在字典中有两个列表。比方说,
List1: List2:
Anterior cord
cuneate nucleus Medulla oblongata
nucleus Spinal cord
Intermediolateral nucleus Spinal
sksdsj
british 7
我有一些文本行如下:
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
我必须返回那些属于 list1 和 list1 的字符串的行; list2.So,我尝试使用以下代码:
result = ""
if list1 in line and list2 in line:
i1 = re.sub('(?i)(\s+)(%s)(\s+)'%list1, '\\1<e1>\\2</e1>\\3', line)
i2 = re.sub('(?i)(\s+)(%s)(\s+)'%list2, '\\1<e2>\\2</e2>\\3', i1)
result = result + i2 + "\n"
continue
但我得到以下结果:
<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate <e1>nucleus</e1> is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
这里,只有结果第4行,我从两个列表中得到了我想要的匹配字符串。但是,我不想要获取仅匹配一个字符串或不匹配任何字符串的行(例如,结果行 1 和 3)。另外,如果与两个列表中的字符串匹配,是否应该标记它们(例如,结果行 2)。
任何形式的帮助将不胜感激。
I am quite new in python & trying to do some new stuff.I have two list in a dictionary.Let's say,
List1: List2:
Anterior cord
cuneate nucleus Medulla oblongata
nucleus Spinal cord
Intermediolateral nucleus Spinal
sksdsj
british 7
And I have some text lines as below:
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
I have to get return those line whose belongs string both from list1 & list2.So,I have tried with the following code:
result = ""
if list1 in line and list2 in line:
i1 = re.sub('(?i)(\s+)(%s)(\s+)'%list1, '\\1<e1>\\2</e1>\\3', line)
i2 = re.sub('(?i)(\s+)(%s)(\s+)'%list2, '\\1<e2>\\2</e2>\\3', i1)
result = result + i2 + "\n"
continue
But I am getting the following result:
<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate <e1>nucleus</e1> is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
Here,Only the result line-4, I got that matches string from both list that is what I want.But, I don't want to get those line which match only one or no string(eg. result line-1 & 3).Also,if matches string from both list , should it tag them(eg. result line-2).
Any kind of help will be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
基本上,您希望将一些单词放入
标记中,将其他单词放入
标记中。是这样吗?如果是这样,那么将执行类似的操作:
输出:
Basically, you want to put some words in
<e1>
tags and other words in<e2>
tags. Is that right?If so, then something like this will do:
Output:
这个怎么样:
How about this: