具有最小复杂度的 Anagram 算法
最近,我被要求设计一种算法来检查两个字符串是否是彼此的字谜。我的目标是最小化空间和时间复杂度,因此我想出了这个算法:
- 创建一个包含 26 个元素的数组,每个元素都初始化为零。
- 遍历第一个字符串,对于每个字符,递增与该字符对应的数组元素。
- 遍历第二个字符串,对于每个字符,递减该字符对应的数组元素。
- 扫描阵列。如果所有元素均为 0,则这两个字符串是字谜词。
然而,这个算法的时间复杂度是O(n),我无法想出一个复杂度更低的算法。有人知道其中一个吗?
I recently was asked to design an algorithm that checks if two strings are anagrams of one another. My goal was to minimize space and time complexity, so I came up with this algorithm:
- Create an array of 26 elements, each initialized to zero.
- Traverse the first string and for each character, increment the array element corresponding to that character.
- Traverse the second string and for each character, decrement the array element corresponding to that character.
- Scan over the array. If all elements are 0, the two strings are anagrams.
However, the time complexity of this algorithm is O(n) and I cannot come up with an algorithm with lower complexity. Does anybody know of one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的算法是渐近最优的。不可能在 Ω(n) 时间内解决这个问题。为了看到这一点,假设存在一个算法 A 可以在 o(n) 时间内解决问题(注意,这里是 n 的小 o)。那么对于任意 1 > ε> 0,存在某个 n,使得对于大小至少为 n 的任何输入,算法必须以最多 εn 个步骤终止。设置 ε = 1/3 并考虑算法的任何输入,其长度至少为 n(对于上述 ε 的 n)。由于该算法最多可以查看两个字符串中的 1/3 个字符,因此该函数必须有两个不同的输入,一个是一对字谜词,另一个不是,这样该算法就会查看每个输入的字符的相同子集。然后,该函数必须在每种情况下产生相同的输出,因此至少有一个输入是错误的。我们遇到了矛盾,所以这样的算法一定不存在。
Your algorithm is asymptotically optimal. It's not possible to solve this problem in any better than Ω(n) time. To see this, suppose that an algorithm A exists that can solve the problem in o(n) time (note that this is little-o of n here). Then for any 1 > ε > 0, there is some n such that for any input of size at least n, the algorithm must terminate in at most εn steps. Set ε = 1/3 and consider any inputs to the algorithm that are of length at least n for the aforementioned n for this ε. Since the algorithm can look at most 1/3 of the characters in the two strings, then there must be two different inputs to the function, one that is a pair of anagrams and one that isn't, such that the algorithm looks at the same subset of the characters of each input. The function would then have to produce the same output in each case, and thus would be wrong on at least one of the inputs. We've reached a contradiction, so no such algorithm must exist.
通过提前退出,您可能会提高平均绩效。在扫描第二个字符串时,如果在递减之前 count[char] 为 0,则没有字谜,可以停止扫描。
此外,如果字符串短于 26 个字符,则在最后一步中,仅检查第一个字符串中的字符是否为零。
这不会改变大 O,但它可以将您的平均运行时间更改为小于建议解决方案的 2N+26,具体取决于您的数据。
You could possibly improve average performance with early exits. While scanning the 2nd string, if count[char] is 0 before you decrement, you don't have an anagram and you can stop scanning.
Also, if the strings are shorter than 26 chars, then in the last step, check only the chars in the first string for zeroes.
This doesn't change the big O, but it can change your average runtime to something less than the 2N+26 o the proposed solution, depending on your data.
我们来提一个问题:
给定两个字符串 s 和 t,编写一个函数来确定 t 是否是 s 的字谜词。
例如,s =“anagram”,t =“nagaram”,返回 true。 s =“老鼠”,t =“汽车”,返回 false。
方法1(使用HashMap):
方法2:
方法3:
方法4:
Let's take a question:
Given two strings s and t, write a function to determine if t is an anagram of s.
For example, s = "anagram", t = "nagaram", return true. s = "rat", t = "car", return false.
Method 1(Using HashMap ):
Method 2 :
Method 3 :
Method 4 :
为了确保字符串是字谜词,您需要比较整个字符串 - 那么这怎么可能比 o(n) 更快呢?
To be sure the strings are anagrams you need to compare the whole strings - so how could that be faster than o(n)?