Python 中的字符串覆盖优化

发布于 2024-10-02 02:21:45 字数 535 浏览 10 评论 0原文

我有这个初始字符串。

'bananaappleorangestrawberryapplepear'

并且还有一个带有字符串的元组：

('apple', 'plepe', 'leoran', 'lemon')

我想要一个函数，以便从初始字符串和带有字符串的元组中获得：

'bananaxxxxxxxxxgestrawberryxxxxxxxar'

我知道如何通过在每个单词的初始字符串中查找单词然后循环字符来强制执行此操作所有初始字符串中带有替换单词的字符。

但它不是很有效而且丑陋。我怀疑应该有某种方法可以用 itertools 或其他东西以功能性的方式更优雅地做到这一点。如果您知道可以有效执行此操作的 Python 库，请告诉我。

更新：贾斯汀·皮尔指出了我在最初的问题中没有描述的一个案例。如果单词是“aaa”且初始字符串中包含“aaaaaa”，则输出应类似于“xxxxxx”。

原文

I have this initial string.

'bananaappleorangestrawberryapplepear'

And also have a tuple with strings:

('apple', 'plepe', 'leoran', 'lemon')

I want a function so that from the initial string and the tuple with strings I obtain this:

'bananaxxxxxxxxxgestrawberryxxxxxxxar'

I know how to do it imperatively by finding the word in the initial string for every word and then loop character by character in all initial string with replaced words.

But it's not very efficient and ugly. I suspect there should be some way of doing this more elegantly, in a functional way, with itertools or something. If you know a Python library that can do this efficiently please let me know.

UPDATE: Justin Peel pointed out a case I didn't describe in my initial question. If a word is 'aaa' and 'aaaaaa' is in the initial string, the output should look like 'xxxxxx'.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空气里的味道 2024-10-09 02:21:45

import re

words = ('apple', 'plepe', 'leoran', 'lemon')
s = 'bananaappleorangestrawberryapplepear'

x = set()

for w in words:
    for m in re.finditer(w, s):
        i = m.start()
        for j in range(i, i+len(w)):
            x.add(j)

result = ''.join(('x' if i in x else s[i]) for i in range(len(s)))
print result

产生：

bananaxxxxxxxxxgestrawberryxxxxxxxar

import re

words = ('apple', 'plepe', 'leoran', 'lemon')
s = 'bananaappleorangestrawberryapplepear'

x = set()

for w in words:
    for m in re.finditer(w, s):
        i = m.start()
        for j in range(i, i+len(w)):
            x.add(j)

result = ''.join(('x' if i in x else s[i]) for i in range(len(s)))
print result

produces:

bananaxxxxxxxxxgestrawberryxxxxxxxar

回复收藏 0 原文

最丧也最甜 2024-10-09 02:21:45

这是另一个答案。可能有一种更快的方法来用 x 替换字母，但我认为没有必要，因为这已经相当快了。

import re

def do_xs(s,pats):
    pat = re.compile('('+'|'.join(pats)+')')

    sout = list(s)
    i = 0
    match = pat.search(s)
    while match:
        span = match.span()
        sout[span[0]:span[1]] = ['x']*(span[1]-span[0])
        i = span[0]+1
        match = pat.search(s,i)
    return ''.join(sout)

txt = 'bananaappleorangestrawberryapplepear'
pats = ('apple', 'plepe', 'leoran', 'lemon')
print do_xs(txt,pats)

基本上，我创建一个将匹配任何输入模式的正则表达式模式。然后我就在最近一场比赛的起始位置之后从 1 开始继续重新开始搜索。但是，如果其中一个输入模式是另一个输入模式的前缀，则可能会出现问题。

Here's another answer. There might be a faster way to replace the letters with x's, but I don't think that it is necessary because this is already pretty fast.

import re

def do_xs(s,pats):
    pat = re.compile('('+'|'.join(pats)+')')

    sout = list(s)
    i = 0
    match = pat.search(s)
    while match:
        span = match.span()
        sout[span[0]:span[1]] = ['x']*(span[1]-span[0])
        i = span[0]+1
        match = pat.search(s,i)
    return ''.join(sout)

txt = 'bananaappleorangestrawberryapplepear'
pats = ('apple', 'plepe', 'leoran', 'lemon')
print do_xs(txt,pats)

Basically, I create a regex pattern that will match any of the input patterns. Then I just keep restarting the search starting 1 after the starting position of the most recent match. There might be a problem though if you have one of the input patterns is a prefix of another input pattern.

回复收藏 0 原文

冰雪梦之恋 2024-10-09 02:21:45

假设我们仅限于在没有 stdlib 和其他导入的情况下工作：

s1 = 'bananaappleorangestrawberryapplepear'
t = ('apple', 'plepe', 'leoran', 'lemon')
s2 = s1

solution = 'bananaxxxxxxxxxgestrawberryxxxxxxxar'

for word in t:
    if word not in s1: continue
    index = -1 # Start at -1 so our index search starts at 0
    for iteration in range(s1.count(word)):
        index = s1.find(word, index+1)
        length = len(word)
        before = s2[:index]
        after = s2[index+length:]
        s2 = before + 'x'*length + after

print s2 == solution

Assuming we're restricted to working without stdlib and other imports:

s1 = 'bananaappleorangestrawberryapplepear'
t = ('apple', 'plepe', 'leoran', 'lemon')
s2 = s1

solution = 'bananaxxxxxxxxxgestrawberryxxxxxxxar'

for word in t:
    if word not in s1: continue
    index = -1 # Start at -1 so our index search starts at 0
    for iteration in range(s1.count(word)):
        index = s1.find(word, index+1)
        length = len(word)
        before = s2[:index]
        after = s2[index+length:]
        s2 = before + 'x'*length + after

print s2 == solution

回复收藏 0 原文

你的笑 2024-10-09 02:21:45

>>> string_ = 'bananaappleorangestrawberryapplepear'
>>> words = ('apple', 'plepe', 'leoran', 'lemon')
>>> xes = [(string_.find(w), len(w)) for w in words]
>>> xes
[(6, 5), (29, 5), (9, 6), (-1, 5)]
>>> for index, len_ in xes:
...   if index == -1: continue
...   string_ = string_.replace(string_[index:index+len_], 'x'*len_)
...
>>> string_
'bananaxxxxxxxxxgestrawberryxxxxxxxar'
>>>

当然还有更有效的方法，但过早的优化才是万恶之源。

>>> string_ = 'bananaappleorangestrawberryapplepear'
>>> words = ('apple', 'plepe', 'leoran', 'lemon')
>>> xes = [(string_.find(w), len(w)) for w in words]
>>> xes
[(6, 5), (29, 5), (9, 6), (-1, 5)]
>>> for index, len_ in xes:
...   if index == -1: continue
...   string_ = string_.replace(string_[index:index+len_], 'x'*len_)
...
>>> string_
'bananaxxxxxxxxxgestrawberryxxxxxxxar'
>>>

There are surely more effective ways, but the premature optimisation is the root of all evil.

回复收藏 0 原文

冰雪之触 2024-10-09 02:21:45

a = ('apple', 'plepe', 'leoran', 'lemon')
b = 'bananaappleorangestrawberryapplepear'

for fruit in a:
    if a in b:
        b = b.replace(fruit, numberofx's)

现在您唯一要做的就是确定要替换多少个 X。

a = ('apple', 'plepe', 'leoran', 'lemon')
b = 'bananaappleorangestrawberryapplepear'

for fruit in a:
    if a in b:
        b = b.replace(fruit, numberofx's)

The only thing you have to do now his determine how many X's to replace with.

回复收藏 0 原文

—━☆沉默づ 2024-10-09 02:21:45

def mask_words(s, words):
    mask = [False] * len(s)
    for word in words:
        pos = 0
        while True:
            idx = s.find(word, pos)
            if idx == -1:
                break

            length = len(word)
            for i in xrange(idx, idx+length):
                mask[i] = True
            pos = idx+length

    # Sanity check:
    assert len(mask) == len(s)

    result = []
    for masked, c in zip(mask, s):
        result.append('x' if masked else c)

    return "".join(result)

def mask_words(s, words):
    mask = [False] * len(s)
    for word in words:
        pos = 0
        while True:
            idx = s.find(word, pos)
            if idx == -1:
                break

            length = len(word)
            for i in xrange(idx, idx+length):
                mask[i] = True
            pos = idx+length

    # Sanity check:
    assert len(mask) == len(s)

    result = []
    for masked, c in zip(mask, s):
        result.append('x' if masked else c)

    return "".join(result)

回复收藏 0 原文

~没有更多了~