我的算法的运行时间复杂度 - 我如何计算它并进一步优化算法?

发布于 2024-10-14 18:45:17 字数 5033 浏览 10 评论 0原文

我设计了一个递归算法并用Python写下来。当我用不同的参数测量运行时间时,它似乎需要指数时间。此外;对于像50这样的小数字,需要半个多小时才能结束。(我没有等到它完成,但它似乎没有在合理的时间内完成,猜测它是指数级的)。

所以,我很好奇这个算法的运行时间复杂度。有人可以帮我推导出方程 T(n,m) 吗?还是要计算大哦?

算法如下:

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

我不知道是否必须理解问题和算法才能得出最坏情况的运行时间,但这是我试图解决的问题:

问题:

给定一个源字符串SRC和一个搜索字符串SEA,在SRC中查找子序列SEA,并返回在SRC中找到SEA的每个字符的索引。如果 SEA 中的某个字符可以出现在 SRC 中的多个位置,则为该字符位置返回 -1。

例如; 如果源字符串是 BRRBRBR (N=7) 并且搜索字符串是 BBB: 那么“BBB”中的第一个“B”可以出现在搜索字符串中的索引 0 处。第二个“B”可以位于搜索字符串的索引 3 处,最后一个“B”可以位于第 5 个位置。此外;字符“BBB”的位置不存在其他选择,因此算法返回[0,3,5]。

在另一种情况下,源字符串是 BRRBRB (N=6),搜索字符串是 RBR: 'RBR' 的第一个 'R' 可以位于位置 1 或 2。这样就只剩下位置 3 为 'B' 和位置 4 为最后一个 'R'。那么,第一个“R”可以出现在多个地方,它的位置是不明确的。另外两个字符 B 和 R 只有一个位置。所以算法返回[-1,4,5]。

算法无法完成并永远持续的情况是当源字符串为 ['B', 'R', 'R', 'B', 'R', 'B', 'R', 'B' 时, 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', ' B'、'B'、'R'、'B'、'B'、'B'、'B'、'B'、'R'、'B'、'R'、'B'、'B' , 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', ' B'、'R'、'B'、'B'、'B'、'B'、'B'、'R'、'B'、'B'、'B'、'B'、'B' ] (N=58) 搜索字符串为 RBRRBRBBBRRRBBRRBBBRRBBBRR。它应该返回 [-1, -1, -1, -1, -1, -1, -1, -1, 17, 18, 19, 23, -1, -1, -1, -1, -1 , -1, -1, -1, -1, -1, -1, -1, 47, 53 ],但不幸的是它没有 =(

优化:

我想停止搜索当“索引”列表完全充满-1时,但这只会影响最好的情况(或者可能是平均情况),但不会影响最坏的情况,我知道存在一种算法。 比优化

是,我真的很好奇运行时间的 T(n,m) 方程,其中 n 和 m 是源字符串和搜索字符串的长度。

更重要的 读到这里,非常感谢! =)

编辑 - IVIad 的解决方案已实施:

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

I designed a recursive algorithm and wrote it down in Python. When I measure the running time with different parameters, it seems to take exponential time. Furthermore; it takes more than half an hour to end with small numbers such as 50. (I didn't wait until it finishes, but it doesn't seem to finish in a reasonable amount of time, guess it's exponential).

So, I'm curious about the running time complexity of this algorithm. Can someone please help me derive the equation T(n,m)? Or to compute the big-oh?

The algorithm is below:

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

I don't know if one has to understand the problem and the algorithm in order to derive the worst-case running time, but here is the problem I attempted to solve:

The Problem:

Given a source string SRC and a search string SEA, find the subsequence SEA in SRC and return the indexes of where each character of SEA was found in SRC. If a character in SEA can be at multiple places in SRC, return -1 for that characters position.

For instance;
if the source string is BRRBRBR (N=7) and the search string is BBB:
then the first 'B' in 'BBB' can appear at index 0 in the search string. The second 'B' can be at index 3 of the search string and the last 'B' can be at the 5th position. Furthermore; there exists no other alternatives for the positions of the characters 'BBB', and thus the algorithm returns [0,3,5].

In another case, where the source string is BRRBRB (N=6) and the search string is RBR:
the first 'R' of 'RBR' can be at position 1 or 2. This leaves only position 3 for 'B' and position 4 for the last 'R'. Then, the first 'R' can be at multiple places, it's place is ambigious. The other two characters, B and R, have only one place. So the algorithm returns [-1,4,5].

The case where the algorithm doesn't finish and take forever is when the source string is ['B', 'R', 'R', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B'] (N=58)
and the search string is RBRRBRBBRBRRBBRRBBBRRBBBRR. It should return [-1, -1, -1, -1, -1, -1, -1, -1, 17, 18, 19, 23, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 47, 53 ], but unfortunately it doesn't =(

Optimizations:

I thought of halting the search when the 'indexes' list was completely full of -1s. But that only affects the best-case (or maybe the average-case) but not the worst-case. How can one further optimize this algorithm. I know that there exists a polynomial solution to this problem.

More important than the optimizations, I'm really curious about the T(n,m) equation of the running time, where n and m are the lengths of the source and search strings.

If you were able to read until here, thank you very much! =)

EDIT - IVIad's solution implemented:

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一影成城 2024-10-21 18:45:18

您可以在 O(n + m) 中完成此操作,其中 mSEA 中的字符数,n code> SRC 中的字符数:

last = 1
for i = 1 to m do
    while SRC[last] != SEA[i]
        ++last

    print last
    ++last (skip this match)

基本上,对于 SEA 中的每个字符,找到其在 SRC 中的位置,但仅扫描该位置之后您找到前一个角色的位置。

<块引用>

例如;如果源字符串是 BRRBRBR (N=7) 并且搜索字符串是 BBB

则:在 SRC 中查找 B:在 last = 1 处找到
打印1,设置last = 2

SRC中查找B:在last = 4处找到,打印4,设置last = 5< /代码>。

SRC中查找B:在last = 6处找到,打印6,设置last = 7< /代码>。完毕。


至于算法的复杂性,我无法提供非常正式的分析,但我会尝试解释我将如何进行分析。

假设 SRCSEA 以及它们之间的所有字符都相同。因此我们可以消除 while 循环中的条件。另请注意,您的 while 循环执行 n 次。

请注意,对于第一个字符,您将调用 find(1, 1), ... find(m, n)。但它们也会启动它们的 while 循环并进行它们自己的递归调用。每个 find(i, j) 都会在其 while 循环中进行 O(m) 次递归调用,对于 i = 1 到 n。但这些反过来又会进行更多的递归调用,从而导致某种“雪崩效应”,从而导致指数级的复杂性。

所以你有:

i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
       find(2, 2) calls find(3, 3), ..., find(m, n)
       find(3, 3) calls find(4, 4), ..., find(m, n)
       find(4, 4) calls find(5, 5), ..., find(m, n)
       ...
       total calls: O(m^m)
i = 2: same, but start from find(2, 3).
...
i = n: same

总复杂度看起来像O(n*m^m)。我希望这是有道理的,而且我没有犯任何错误。

You can do it in O(n + m), where m is the number of characters in SEA, and n the number of characters in SRC:

last = 1
for i = 1 to m do
    while SRC[last] != SEA[i]
        ++last

    print last
    ++last (skip this match)

Basically, for each character in SEA, find its position in SRC, but only scan after the position where you found the previous character.

For instance; if the source string is BRRBRBR (N=7) and the search string is BBB

Then: find B in SRC: found at last = 1
print 1, set last = 2.

Find B in SRC: found at last = 4, print 4, set last = 5.

Find B in SRC: found at last = 6, print 6, set last = 7. Done.


As for the complexity of your algorithm, I'm not able to provide a very formal analysis, but I'll try to explain how I'd go about it.

Assume that all characters are equal in both SRC and SEA and between them. Therefore we can eliminate the condition in your while loop. Also note that your while loop executes n times.

Note that for the first character you will call find(1, 1), ... find(m, n). But these will also start their while loops and make their own recursive calls. Each find(i, j) will make O(m) recursive calls that in its while loop, for i = 1 to n. But these in turn will make more recursive calls themselves, resulting in a sort of "avalanche effect" that causes exponential complexity.

So you have:

i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
       find(2, 2) calls find(3, 3), ..., find(m, n)
       find(3, 3) calls find(4, 4), ..., find(m, n)
       find(4, 4) calls find(5, 5), ..., find(m, n)
       ...
       total calls: O(m^m)
i = 2: same, but start from find(2, 3).
...
i = n: same

Total complexity thus looks like O(n*m^m). I hope this makes sense and I haven't made any mistakes.

百合的盛世恋 2024-10-21 18:45:18

这简直就是最长公共子序列问题。这可以通过动态编程来实现,以获得比指数时间少得多的时间。在您的情况下,当 LCS 返回 SEA 的长度时,您就知道序列 SEA 存在于 SRC 中,在修改算法时保存它们的索引是一件微不足道的事情。这是一个很好的解释的链接。 http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

This is simply the longest common subsequence problem. This can be implemented with dynamic programming to get a lot less than exponential time. In your case when LCS returns the length of SEA then you know that the sequence SEA exists in SRC, saving their indexes is a trivial thing when modifying the algorithm. Here's a link to a good explanation. http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

梦年海沫深 2024-10-21 18:45:18

乍一看,你是在搜索、递归和回溯吗?

我相信为源字符串创建一个 后缀数组 是一个好主意。
构造 后缀数组 的复杂度为 O(nlogn)。定位子字符串的时间复杂度为 O(logn)。

From a quick look, you're searching, recursing and backtracking?

I believe creating a suffix array for your source string would be a good idea.
Constructing the suffix array has O(nlogn) complexity. Locating a substring has O(logn) time complexity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文