按长度查找字符串的重复项

发布于 2025-01-09 00:24:33 字数 678 浏览 0 评论 0原文

我有一串类似于下图所示的字母:

'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

我将其视为密文,因此想要开始查找重复的位置,以便找到加密密钥的长度(上面的示例是随机的,因此没有直接的方法)答案将来自它)

现在我想要做的是编写一个可以找到长度为 3 的重复的代码 - 例如重复“MAP”和“HAS”。我希望代码能够找到这些重复项,而不是我必须指定它应该查找的子字符串。

以前我使用过:

text.find("MAP")

使用下面的答案我写过:

substring = []
for i in range(len(Phrase)-4):
    substring.append(Phrase[i:i+4])
    
for index, value in freq.iteritems():
    if value > 1:
        for i in range(len(Phrase)-4):
            if index == Phrase[i:i+4]:
                print(index)

这给出了每个重复子字符串出现次数的列表,理想情况下我希望这只是子字符串及其出现位置的列表

I have a string of letters similar to that shown below:

'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

I am treating this as a cipher text and therefore want to begin to find the position of repetitions in order to find the length of the encryption key (the example above is random so no direct answers will come from it)

For now what I want to be able to do is write a code that can find repetitions of length 3 - for example 'MAP' and 'HAS' are repeated. I want the code to find these repetitions as opposed to me having to specify the substring it should look for.

Previously I have used:

text.find("MAP")

Using the answer below I have written:

substring = []
for i in range(len(Phrase)-4):
    substring.append(Phrase[i:i+4])
    
for index, value in freq.iteritems():
    if value > 1:
        for i in range(len(Phrase)-4):
            if index == Phrase[i:i+4]:
                print(index)

This gives a list of each repeated substring as many times as it appears, ideally I want this to just be a list of the substring with the positions it appears in

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

回心转意 2025-01-16 00:24:33

这是我所做的:)

import pandas as pd

# find frequency of each length 3 substring
Phrase    = "Maryhadalittlarymbada"
substring = []
for i in range(len(Phrase)-3):
    substring.append(Phrase[i:i+3])
Frequency  = pd.Series(substring).value_counts()

# find repetition's position in string
for index, value in Frequency.iteritems():
    positions = []
    if value > 1:
        for i in range(len(Phrase)-3):
            if index == Phrase[i:i+3]:
                positions.append(i)
        print(index, ": ", positions)
    else:
        continue

Here what I did :)

import pandas as pd

# find frequency of each length 3 substring
Phrase    = "Maryhadalittlarymbada"
substring = []
for i in range(len(Phrase)-3):
    substring.append(Phrase[i:i+3])
Frequency  = pd.Series(substring).value_counts()

# find repetition's position in string
for index, value in Frequency.iteritems():
    positions = []
    if value > 1:
        for i in range(len(Phrase)-3):
            if index == Phrase[i:i+3]:
                positions.append(i)
        print(index, ": ", positions)
    else:
        continue
╰ゝ天使的微笑 2025-01-16 00:24:33

这是一个仅使用内置函数的解决方案

import itertools, collections
text = 'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

创建一个函数,该函数将生成三个重叠的块 - 灵感来自成对函数

def three_at_a_time(text):
    '''Overlapping chunks of three.

    text : str
    returns generator
    '''
    a,b,c = itertools.tee(text,3)
    # advance the second and third iterators
    next(b)
    next(c)
    next(c)
    return (''.join(t) for t in zip(a,b,c))

用每个块的位置创建一个字典。

triples = enumerate(three_at_a_time(text))
d = collections.defaultdict(list)
for i,triple in triples:
    d[triple].append(i)

过滤字典以查找具有多个位置的块。

# repeats = itertools.filterfalse(lambda item: len(item[1])==1,d.items())
repeats = [(k,v) for k,v in d.items() if len(v)>1]

例子:

>>> for chunk in repeats:
...     print(chunk) 
... 
('HAS', [10, 51])
('MAP', [15, 28, 40, 55])
('OMA', [27, 39, 54])
('APD', [16, 56])
>>>

Here is a solution using only built-ins

import itertools, collections
text = 'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

Make a function that will produce overlapping chunks of three - inspired by the pairwise function.

def three_at_a_time(text):
    '''Overlapping chunks of three.

    text : str
    returns generator
    '''
    a,b,c = itertools.tee(text,3)
    # advance the second and third iterators
    next(b)
    next(c)
    next(c)
    return (''.join(t) for t in zip(a,b,c))

Make a dictionary with the position(s) of each chunk.

triples = enumerate(three_at_a_time(text))
d = collections.defaultdict(list)
for i,triple in triples:
    d[triple].append(i)

Filter the dictionary for chunks that have more than one position.

# repeats = itertools.filterfalse(lambda item: len(item[1])==1,d.items())
repeats = [(k,v) for k,v in d.items() if len(v)>1]

Example:

>>> for chunk in repeats:
...     print(chunk) 
... 
('HAS', [10, 51])
('MAP', [15, 28, 40, 55])
('OMA', [27, 39, 54])
('APD', [16, 56])
>>>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文