如线性时间多数算法?的答案所示,可以以线性方式计算元素数组的多数时间和 log(n)
空间。
结果表明,每个看到该算法的人都认为这是一项很酷的技术。但这个想法是否可以推广到新算法?
该算法的隐藏功能似乎在于保留一个扮演复杂角色的计数器 - 例如“(到目前为止多数元素的计数)-(到目前为止第二多数元素的计数)”。是否有基于相同思想的其他算法?
As seen in the answers to Linear time majority algorithm?, it is possible to compute the majority of an array of elements in linear time and log(n)
space.
It was shown that everyone who sees this algorithm believes that it is a cool technique. But does the idea generalize to new algorithms?
It seems the hidden power of this algorithm is in keeping a counter that plays a complex role -- such as "(count of majority element so far) - (count of second majority so far)". Are there other algorithms based on the same idea?
发布评论
评论(3)
嗯,让我们首先开始了解该算法为何有效,以便“隔离”其中的想法。
该算法的要点是,如果您有一个多数元素,那么您可以将它的每次出现与“另一个”元素相匹配,然后您就有更多的“备用”元素。
因此,我们只有一个计数器来计算客人答案的“备用”出现次数。
如果它达到 0,那么从我们“选举”“当前”元素作为“当前”位置的客体主要元素开始,它就不是子序列的多数元素。
此外,由于我们的“来宾”元素与所考虑的子序列中出现的所有其他元素相匹配,因此所考虑的子序列中没有主要元素。
现在,因为:
是正确的答案通过矛盾可以看出,如果存在一个主要元素,那么当计数器永远不会为零时,我们就会得到整个序列的后缀。
现在:可以在新的 O(1) 大小 O(n) 时间算法中利用的想法是什么?
对我来说,每当您需要计算元素序列上的属性
P
时,您都可以应用此技术:seq[n, m]
扩展到Q(seq[n, m+1]) 不成立,则在 O(1) 时间内 >seq[n, m+1]
P(seq[n, j])
和P(seq[j, m])< 进行计算/code> if
Q(seq[n, j])
成立在我们的例子中,
P
是我们“选出的”主要元素的“备用”出现,而Q
是“P
为零”。如果您以这种方式看待事物,最长公共子序列利用了相同的想法(不知道它的“凉爽系数”;))
Umh, let's first start to understand why the algorithm works, in order to "isolate" the ideas there.
The point of the algorithm is that if you have a majority element, then you can match each occurrence of it with an "another" element, and then you have some more "spare".
So, we just have a counter which counts the number of "spare" occurrences of our guest answer.
If it reaches 0, then it isn't a majority element for the subsequence starting from when we have "elected" the "current" element as the guest major element to the "current" position.
Also, since our "guest" element matches every other element occurrence in the considered subsequence, there are no major elements in the considered subsequence.
Now, since:
it is obvious to see by contradiction that, if a major element exists, then we have a suffix of the whole sequence when the counter never gets to zero.
Now: what's the idea that can be exploited in new, O(1) size O(n) time algorithms?
To me, you can apply this technique whenever you have to compute a property
P
on a sequence of elements which:seq[n, m]
toseq[n, m+1]
in O(1) time ifQ(seq[n, m+1])
doesn't holdP(seq[n, m])
can be computed in O(1) time and space fromP(seq[n, j])
andP(seq[j, m])
ifQ(seq[n, j])
holdsIn our case,
P
is the "spare" occurrences of our "elected" major element andQ
is "P
is zero".If you see things in that way, longest common subsequence exploits the same idea (dunno about its "coolness factor" ;))
Jaydev Misra 和 David Gries 有一篇论文,名为查找重复元素 (ACM 页面)将其概括为重复次数超过 n/k 的元素次(k=2 是多数问题)。
当然,这可能与原始问题非常相似,您可能正在寻找“不同”的算法。
这是一个可能不同的示例。
我相信标准的解决方案是维护一个计数器。
旁注:
对于不能为常数空间等的答案,请向他们询问计算模型。例如,在 WORD RAM 模型中,假设整数/数组索引等为 O(1)。
很多人错误地混合和匹配模型。例如,他们很乐意让 n 个整数的输入数组为 O(n),数组索引为 O(1) 空间,但他们认为计数器是 Omega(log n) 等,这是无意义的。如果他们想考虑以位为单位的大小,那么输入本身就是 Omega(n log n) 等。
Jaydev Misra and David Gries have a paper called Finding Repeated Elements (ACM page) which generalizes it to an element repeating more than n/k times (k=2 is the majority problem).
Of course, this is probably very similar to the original problem, and you are probably looking for 'different' algorithms.
Here is an example which is possibly different.
I believe the standard solution is to maintain a counter.
Side note:
As to answers which claim cannot be constant space etc, ask them for the model of computation. In the WORD RAM model for instance, you assume the integers/array indices etc are O(1).
A lot of folks incorrectly mix and match models. For instance, they will happily have the input array of n integers be O(n), have an array index be O(1) space, but a counter they consider Omega(log n) etc, which is nonsense. If they want to consider the size in bits, then the input itself is Omega(n log n) etc.
对于想要了解该算法的作用及其原理的人:查看我的详细答案。
在这里我将描述该算法的自然扩展(或概括)。因此,在标准多数投票算法中,您必须找到一个在流中至少出现
n/2
次的元素,其中n
是流的大小. 您可以在O(n)
时间内完成此操作(使用一个很小的常量并在O(log(n))
空间内完成,最坏的情况并且极不可能通用算法允许您找到
k
个最频繁的项目,其中每次在原始流中至少出现n/(k+1)
次。< /strong> 请注意,如果 k=1,您最终会得到原来的问题,该问题的解决方案与原来的问题非常相似,只是您现在维护 k 个计数器和 k 个可能的元素,而不是一个计数器和一个可能的元素。逻辑以类似的方式进行。您迭代数组,如果该元素在可能的元素中,则增加它的计数器,如果其中一个计数器为零,则用新元素替换该计数器的元素,否则只需减少该计数器。价值观。
与原始多数投票算法一样,您需要保证拥有这 k 个多数元素,否则您必须再次遍历数组以验证之前找到的可能元素是否正确。这是我的 python 尝试(尚未进行彻底的测试)。
For people who want to understand what does this algorithm do and why does it works: look at my detailed answer.
Here I will describe a natural extension of this algorithm (or a generalization). So in a standard majority voting algorithm you have to find an element which appears at least
n/2
times in the stream, wheren
is the size of the stream. You can do this inO(n)
time (with a tiny constant and inO(log(n))
space, worse case and highly unlikely.The generalized algorithm allows you to find
k
most frequent items, where each time appeared at leastn/(k+1)
times in the original stream. Note that if k=1, you end up with your original problem.Solution to this problem is really similar to the original one, except instead of one counter and one possible element, you maintain k counters and k possible elements. Now the logic goes in a similar way. You iterate through the array and if the element is in the possible elements, you increase it's counter, if one of the counters is zero - substitute the element of this counter with new element. Otherwise just decrease the values.
As with original majority voting algorithm, you need to have a guarantee that you have these k majority elements, otherwise you have to do another pass over the array to verify that your previously found possible elements are correct. Here is my python attempt (have not done a thorough testing).