使用正则表达式匹配并替换

发布于 2025-01-29 19:39:40 字数 363 浏览 3 评论 0 原文

有一个字符串a的列表,这是与字符串B的另一个列表匹配的方法。我想使用正则表达式将字符串A替换为匹配字符串B列表。但是我没有得到正确的结果。

解决方案应为 a == [“ Yogesh”,“ Numita”,“ Hero”,“ Yogesh”]

import re

A = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar"]
B=["Yogesh","Numita","Hero"]

for i in A:
    for j in B:
        replaced=re.sub('i','j',i)
        
print(replaced)

There is a list of string A which is some how matching with another list of string B. I wanted to replace string A with list of matching string B using regular expression. However I am not getting the correct result.

The solution should be A == ["Yogesh","Numita","Hero","Yogesh"].

import re

A = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar"]
B=["Yogesh","Numita","Hero"]

for i in A:
    for j in B:
        replaced=re.sub('i','j',i)
        
print(replaced)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

守不住的情 2025-02-05 19:39:40

这对我有用:

lst=[]
for a in A:
    lst.append([b for b in B if b.lower() in a.lower()][0])

如果在列表中找到列表B中的元素。有必要比较较低的单词。添加了 [0] 以获取字符串而不是从理解列表中列表。

this one works to me:

lst=[]
for a in A:
    lst.append([b for b in B if b.lower() in a.lower()][0])

This returns element from list B if it is found at A list. It's necessary to compare lowercased words. The [0] is added for getting string instead of list from comprehension list.

仙女山的月亮 2025-02-05 19:39:40

如果循环 B ,则不需要正则表达式;您可以简单地使用成员资格测试

正则表达式可能会导致更好的性能,因为会员资格测试将为 a 中的每个字符串扫描 b 中的每个字符串,从而导致 o(len(a) * len (b)性能)

只要单个术语不包含任何metacharacter,并且可以在任何上下文中出现,则形成正则表达式的最简单方法就是将 b 的条目与交替操作

reTerms = re.compile('|'.join(B), re.I)

,为了安全起见,应首先逃脱条目

# map-based
reTerms = re.compile('|'.join(map(re.escape, B)), re.I)
# comprehension-based
reTerms = re.compile('|'.join([re.escape(b) for b in B]), re.I)

但是 对术语出现的上下文中的任何限制,都需要预先限制并将其附加到模式中。例如,如果该术语必须以完整的词显示:

reTerms = re.compile(f"\b(?:{'|'.join(map(re.escape, B))})\b", re.I)

可以将此正则表达式应用于 a 的每个项目以获取匹配的文本:

replaced = [reTerms.search(name).group(0) for name in A]
# result: ['yogesh', 'Numita', 'Hero', 'Yogesh']

由于REGEX中的术语是直字符串匹配,因此内容将为正确,但情况可能不正确。可以通过归一化步骤来纠正这一点,将匹配的文本通过 dict

normed = {term.lower():term for term in B}

replaced = [normed[reTerms.search(name).group(0).lower()] for name in A]
# result: ['Yogesh', 'Numita', 'Hero', 'Yogesh']

一个问题仍然存在:如果 a 的项目不匹配该怎么办?然后 reterms.search 返回 none ,它没有 group 属性。如果 none -propagating属性访问添加到Python(例如),使用这种功能很容易解决:

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
normed[None] = None
replaced = [normed[reTerms.search(name)?.group(0).lower()] for name in names]

在没有此类功能的情况下,有多种方法,例如使用三元表达式和 Walrus分配。在下面的示例中,列表用作备用,以提供比赛的默认值:

import re

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
terms = ["Yogesh","Numita","Hero"]
normed = {term.lower():term for term in terms}
normed[''] = None

reTerms = re.compile('|'.join(map(re.escape, terms)), re.I)

# index may need to be changed if `reTerms` includes any context
[normed[(reTerms.search(sentence) or [''])[0].lower()] for sentence in sentences]

If looping over B, you don't need a regular expression; you can simply use membership testing.

A regex might result in better performance, as membership testing will scan each string in A for every string in B, resulting in O(len(A) * len(B) performance).

As long as the individual terms don't contain any metacharacters and can appear in any context, the simplest way to form the regex is to join the entries of B with the alternation operation:

reTerms = re.compile('|'.join(B), re.I)

However, to be safe, the entries should first be escaped, in case any contains a metacharacter:

# map-based
reTerms = re.compile('|'.join(map(re.escape, B)), re.I)
# comprehension-based
reTerms = re.compile('|'.join([re.escape(b) for b in B]), re.I)

If there is any restrictions on the context the terms appear in, sub-patterns for the restrictions would need to be prepended and appended to the pattern. For example, if the terms must appear as full words:

reTerms = re.compile(f"\b(?:{'|'.join(map(re.escape, B))})\b", re.I)

This regex can be applied to each item of A to get the matching text:

replaced = [reTerms.search(name).group(0) for name in A]
# result: ['yogesh', 'Numita', 'Hero', 'Yogesh']

Since the terms in the regex are straight string matches, the content will be correct, but the case may not. This could be corrected by a normalization step, passing the matched text through a dict:

normed = {term.lower():term for term in B}

replaced = [normed[reTerms.search(name).group(0).lower()] for name in A]
# result: ['Yogesh', 'Numita', 'Hero', 'Yogesh']

One issue remains: what if an item of A doesn't match? Then reTerms.search returns None, which doesn't have a group attribute. If None-propagating attribute access is added to Python (such as suggested by PEP 505), this would be easily addressed by using such:

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
normed[None] = None
replaced = [normed[reTerms.search(name)?.group(0).lower()] for name in names]

In the absence of such a feature, there are various approaches, such as using a ternary expression and walrus assignment. In the sample below, a list is used as a stand-in to provide a default value for the match:

import re

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
terms = ["Yogesh","Numita","Hero"]
normed = {term.lower():term for term in terms}
normed[''] = None

reTerms = re.compile('|'.join(map(re.escape, terms)), re.I)

# index may need to be changed if `reTerms` includes any context
[normed[(reTerms.search(sentence) or [''])[0].lower()] for sentence in sentences]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文