按长度查找字符串的重复项

发布于 2025-01-09 00:24:33 字数 678 浏览 0 评论 0原文

我有一串类似于下图所示的字母：

'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

我将其视为密文，因此想要开始查找重复的位置，以便找到加密密钥的长度（上面的示例是随机的，因此没有直接的方法）答案将来自它）

现在我想要做的是编写一个可以找到长度为 3 的重复的代码 - 例如重复“MAP”和“HAS”。我希望代码能够找到这些重复项，而不是我必须指定它应该查找的子字符串。

以前我使用过：

text.find("MAP")

使用下面的答案我写过：

substring = []
for i in range(len(Phrase)-4):
    substring.append(Phrase[i:i+4])
    
for index, value in freq.iteritems():
    if value > 1:
        for i in range(len(Phrase)-4):
            if index == Phrase[i:i+4]:
                print(index)

这给出了每个重复子字符串出现次数的列表，理想情况下我希望这只是子字符串及其出现位置的列表

原文

I have a string of letters similar to that shown below:

'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

I am treating this as a cipher text and therefore want to begin to find the position of repetitions in order to find the length of the encryption key (the example above is random so no direct answers will come from it)

For now what I want to be able to do is write a code that can find repetitions of length 3 - for example 'MAP' and 'HAS' are repeated. I want the code to find these repetitions as opposed to me having to specify the substring it should look for.

Previously I have used:

text.find("MAP")

Using the answer below I have written:

substring = []
for i in range(len(Phrase)-4):
    substring.append(Phrase[i:i+4])
    
for index, value in freq.iteritems():
    if value > 1:
        for i in range(len(Phrase)-4):
            if index == Phrase[i:i+4]:
                print(index)

This gives a list of each repeated substring as many times as it appears, ideally I want this to just be a list of the substring with the positions it appears in

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回心转意 2025-01-16 00:24:33

这是我所做的:)

import pandas as pd

# find frequency of each length 3 substring
Phrase    = "Maryhadalittlarymbada"
substring = []
for i in range(len(Phrase)-3):
    substring.append(Phrase[i:i+3])
Frequency  = pd.Series(substring).value_counts()

# find repetition's position in string
for index, value in Frequency.iteritems():
    positions = []
    if value > 1:
        for i in range(len(Phrase)-3):
            if index == Phrase[i:i+3]:
                positions.append(i)
        print(index, ": ", positions)
    else:
        continue

Here what I did :)

import pandas as pd

# find frequency of each length 3 substring
Phrase    = "Maryhadalittlarymbada"
substring = []
for i in range(len(Phrase)-3):
    substring.append(Phrase[i:i+3])
Frequency  = pd.Series(substring).value_counts()

# find repetition's position in string
for index, value in Frequency.iteritems():
    positions = []
    if value > 1:
        for i in range(len(Phrase)-3):
            if index == Phrase[i:i+3]:
                positions.append(i)
        print(index, ": ", positions)
    else:
        continue

回复收藏 0 原文

╰ゝ天使的微笑 2025-01-16 00:24:33

这是一个仅使用内置函数的解决方案

import itertools, collections
text = 'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

创建一个函数，该函数将生成三个重叠的块 - 灵感来自成对函数。

def three_at_a_time(text):
    '''Overlapping chunks of three.

    text : str
    returns generator
    '''
    a,b,c = itertools.tee(text,3)
    # advance the second and third iterators
    next(b)
    next(c)
    next(c)
    return (''.join(t) for t in zip(a,b,c))

用每个块的位置创建一个字典。

triples = enumerate(three_at_a_time(text))
d = collections.defaultdict(list)
for i,triple in triples:
    d[triple].append(i)

过滤字典以查找具有多个位置的块。

# repeats = itertools.filterfalse(lambda item: len(item[1])==1,d.items())
repeats = [(k,v) for k,v in d.items() if len(v)>1]

例子：

>>> for chunk in repeats:
...     print(chunk) 
... 
('HAS', [10, 51])
('MAP', [15, 28, 40, 55])
('OMA', [27, 39, 54])
('APD', [16, 56])
>>>

Here is a solution using only built-ins

import itertools, collections
text = 'ABTSOFDNSOHASAPMAPDSNFAKSGMOMAPEPTNSNTROMAPKSDFANSDHASOMAPDODDFG'

Make a function that will produce overlapping chunks of three - inspired by the pairwise function.

def three_at_a_time(text):
    '''Overlapping chunks of three.

    text : str
    returns generator
    '''
    a,b,c = itertools.tee(text,3)
    # advance the second and third iterators
    next(b)
    next(c)
    next(c)
    return (''.join(t) for t in zip(a,b,c))

Make a dictionary with the position(s) of each chunk.

triples = enumerate(three_at_a_time(text))
d = collections.defaultdict(list)
for i,triple in triples:
    d[triple].append(i)

Filter the dictionary for chunks that have more than one position.

# repeats = itertools.filterfalse(lambda item: len(item[1])==1,d.items())
repeats = [(k,v) for k,v in d.items() if len(v)>1]

Example:

>>> for chunk in repeats:
...     print(chunk) 
... 
('HAS', [10, 51])
('MAP', [15, 28, 40, 55])
('OMA', [27, 39, 54])
('APD', [16, 56])
>>>

回复收藏 0 原文

~没有更多了~

关于作者

铃予

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

按长度查找字符串的重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

按长度查找字符串的重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。