在字符串中查找正则谱系的所有祖先
我有一个过于复杂的正则是,据我所知,
route = r"""[\s+|\(][iI](\.)?[vV](\.)?(\W|\s|$)?
|\s intravenously|\s intravenous
|[\s|\(][pP](\.)?[oO](\.)?(\W|\s|$)
|\s perorally|\s?(per)?oral(ly)?|\s intraduodenally
|[\s|\(]i(\.)?p(\.)?(\W|\s|$)?
|\s intraperitoneal(ly)?
|[\s|\(]i(\.)?c(\.)?v(\.)?(\W|\s|$)?
|\s intracerebroventricular(ly)?
|[\s|\(][iI](\.)?[gG](\.)?(\W|\s|$)?
|\s intragastric(ly)?
|[\s|\(]s(\.)?c(\.)?(\W|\s|$)?
|subcutaneous(ly)?(\s+injection)?
|[\s|\(][iI](\.)?[mM](\.)?(\W|\s|$)?
|\sintramuscular
"""
对于re.search
,我设法获得了众多模式之一,如果
s = 'Pharmacokinetics parameters evaluated after single IV or IM'
m = re.search(re.compile(route, re.X), s)
m.group(0)
' IV '
我在其他地方阅读了其他地方使用re。查找
查找所有出现。 在我的梦中,这将回来:
['IV', 'IM']
不幸的是,结果是:
[('',
'',
' ',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''),
('',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'')]
I have an overly complicated regex that as far as I know is correct
route = r"""[\s+|\(][iI](\.)?[vV](\.)?(\W|\s|$)?
|\s intravenously|\s intravenous
|[\s|\(][pP](\.)?[oO](\.)?(\W|\s|$)
|\s perorally|\s?(per)?oral(ly)?|\s intraduodenally
|[\s|\(]i(\.)?p(\.)?(\W|\s|$)?
|\s intraperitoneal(ly)?
|[\s|\(]i(\.)?c(\.)?v(\.)?(\W|\s|$)?
|\s intracerebroventricular(ly)?
|[\s|\(][iI](\.)?[gG](\.)?(\W|\s|$)?
|\s intragastric(ly)?
|[\s|\(]s(\.)?c(\.)?(\W|\s|$)?
|subcutaneous(ly)?(\s+injection)?
|[\s|\(][iI](\.)?[mM](\.)?(\W|\s|$)?
|\sintramuscular
"""
With re.search
I manage to get one of the numerous patterns if it is a string
s = 'Pharmacokinetics parameters evaluated after single IV or IM'
m = re.search(re.compile(route, re.X), s)
m.group(0)
' IV '
I read somewhere else to use re.findall
to find all the occurrences.
In my dreams, this would return:
['IV', 'IM']
Unfortunately instead the result is:
[('',
'',
' ',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''),
('',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'')]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于摘录,您显示:
demo
eveptices:
#
开始添加内联注释。\ s
或丑陋的[\ s | \(]
之类的东西界定子字符串(您无需在字符类中逃脱括号|
并不意味着或内部)和(\ w | \ s | $)?
(这是完全没有用的) Word Bornaries\ B
(请阅读有关它匹配的情况)。(?:...)
而不是捕获组(...)
。 .findall returns only the capture groups content and not the whole match).(?:per)? (?:ly)? \ b | p(?:eroral(?:ly)?\ b | \。?o \。?)
For the excerpt you show:
demo
Advices:
#
.\s
or the ugly[\s|\(]
(you don't need to escape a parenthesis in a character class and|
doesn't mean OR inside it) and(\W|\s|$)?
(that is totally useless since you make it optional). Forget that and use word boundaries\b
(read about it to well understand in which cases it matches).re.findall
instead or re.search since you expect several matches in a single string.(?: ... )
instead of capturing groups( ... )
. (when a pattern contains capture groups,re.findall
returns only the capture groups content and not the whole match).(?:per)? oral (?:ly)? \b | p \.? o \b \.?
could be rewritten in this way:oral (?:ly)? \b | p (?: eroral (?:ly)? \b | \.? o \.?)
使用 (?:。)? TP在单词模式中找不到一个或一个时期。请注意,我发现IP或IP该模式匹配并不能在模式中排除模式,例如poip旁边的IP。
打印(“找到有或没有周期的单词的组合”)
输出:
use (?:.)? tp find none or one period in the word pattern. Notice I found ip or i.p. The pattern matching does not exclude patterns within patterns, for example ip next to POip.
print ("find combinations of words with or without periods")
output: