跟踪扩展数组的中位数
面试问题:
编辑如下 给你一个数组。您可以从中创建 2 个堆,一个是最小堆,另一个是最大堆。现在使用这 2 个提供的堆在 O(nlog n) 时间内找到数组的中位数。
更正问题 数字是随机生成的并存储到(扩展)数组中。您将如何跟踪中位数?
解决方案 这个问题可以使用 2 个堆来解决,并且中位数总是可以在 O(1) 时间内访问到。
Interview Question:
Edited Below
You are given an array. You make 2 heaps out of it, one minheap and the other max heap. Now find the median of the array using these 2 provided heaps in O(nlog n) time.
Corrected Question
Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?
Solution
This problem can be solved using 2 heaps and the median can always be accessed in O(1) time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(5)
java 中的一个有效实现,使用 2 个堆,O(n log n)。在任何时候,我都会保持两个堆的大小平衡(即,如果我们输入 n 个元素使得 n%2==1,它们最多相差 1)。获取中位数的时间复杂度为 O(1)。添加新元素的时间复杂度为 O(log n)。
public class MedianOfStream {
private int count;
private PriorityQueue<Integer> highs, lows;
public MedianOfStream() {
highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
@Override
public int compare(Integer arg0, Integer arg1) {
return arg0.compareTo(arg1);
}
});
lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
@Override
public int compare(Integer arg0, Integer arg1) {
return arg1.compareTo(arg0);
}
});
}
private int getMedian() {
if (count == 0)
return 0;
if (lows.size() == highs.size()) {
return (lows.peek() + highs.peek()) / 2;
} else if (lows.size() < highs.size()) {
return highs.peek();
}
return lows.peek();
}
private void swap(){
int h = highs.poll();
int l = lows.poll();
highs.add(l);
lows.add(h);
}
public int updateMedian(int n) {
count++;
if (count == 1)
lows.add(n);
else if (count==2) {
highs.add(n);
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
else {
if (n > highs.peek()) {
lows.add(highs.poll()); // O(log n)
highs.add(n); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
lows.add(n); // O(log n)
}
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
// if we added an even # of items,
// the heaps must be exactly the same size,
// otherwise we tolerate a 1-off difference
if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
if (lows.size() < highs.size()) {
lows.add(highs.poll()); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
}
}
return getMedian(); // O(1)
}
}
这是它的 python 实现。我测试了使用 Random.org 并在 中值查找器 其中大多数似乎都有效。
import heapq
def medianFinder(arr):
minHeap = []
maxHeap = []
def handleMedianFinder(num: int):
if not len(maxHeap):
heapq.heappush(maxHeap, -num)
return -maxHeap[0]
elif not len(minHeap):
heapq.heappush(minHeap, num)
return (minHeap[0]-maxHeap[0])/2
if num < minHeap[0]:
if len(maxHeap) > len(minHeap):
oldMedian = -heapq.heappop(maxHeap)
heapq.heappush(minHeap, oldMedian)
heapq.heappush(maxHeap, -num)
return (minHeap[0]-maxHeap[0])/2 if (len(minHeap) + len(maxHeap))%2 == 0 else minHeap[0]
heapq.heappush(maxHeap, -num)
elif num > -maxHeap[0]:
if len(maxHeap) < len(minHeap):
oldMedian = heapq.heappop(minHeap)
heapq.heappush(maxHeap, -oldMedian)
heapq.heappush(minHeap, num)
return (minHeap[0]-maxHeap[0])/2 if (len(minHeap) + len(maxHeap))%2 == 0 else -maxHeap[0]
heapq.heappush(minHeap, num)
else: # between the medians (new medians)
if len(maxHeap) < len(minHeap):
oldMedian = heapq.heappop(minHeap)
heapq.heappush(maxHeap, -oldMedian)
heapq.heappush(minHeap, num)
else:
oldMedian = -heapq.heappop(maxHeap)
heapq.heappush(minHeap, oldMedian)
heapq.heappush(maxHeap, -num)
if len(minHeap) < len(maxHeap):
return minHeap[0]
elif len(maxHeap) < len(minHeap):
return -maxHeap[0]
return (minHeap[0]-maxHeap[0])/2
for num in arr:
# print('maxHeap => ', maxHeap)
# print('minHeap => ', minHeap)
print(handleMedianFinder(num))
arr1 = [11, 18, 16, 12, 14, 4, 15, 10, 19, 20]
medianFinder(arr1)
使用两个堆的 JavaScript 解决方案:
function addNewNumber(minHeap, maxHeap, randomNumber) {
if (maxHeap.size() === minHeap.size()) {
if (minHeap.peek() && randomNumber > minHeap.peek()) {
maxHeap.insert(minHeap.remove());
minHeap.insert(randomNumber);
} else {
maxHeap.insert(randomNumber);
}
} else {
if (randomNumber < maxHeap.peek()) {
minHeap.insert(maxHeap.remove());
maxHeap.insert(randomNumber);
} else {
minHeap.insert(randomNumber);
}
}
}
function getMedian(minHeap, maxHeap) {
if (!maxHeap.size()) {
return 0;
}
if (minHeap.size() === maxHeap.size()) {
return (minHeap.peek() + maxHeap.peek()) / 2;
} else {
return maxHeap.peek();
}
}
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
以下是如何使用这两个堆。请注意,我假设您不知道元素的数量,这就是为什么我们必须弹出,直到我们从最小堆中弹出的内容大于或等于从最大堆中弹出的内容为止。请注意,我们返回平均值,因为在像
{1, 2, 3, 4}
这样的集合的情况下,中位数实际上是2.5
(两个“中间”的平均值) “值)。我假设 double 作为值类型,但这显然可以是任何类型。这里:由于弹出的时间复杂度为
O(log n)
,并且我们必须弹出O(n / 2)
个值,所以这是O(n log n)
代码>.Here's how you use both heaps. Note that I'm assuming you don't know the number of elements, and this is why we must pop until we pop something from the min heap that is larger than or equal to what we pop from the max heap. Note that we return the average because in the case of a set like
{1, 2, 3, 4}
the median is actually2.5
(the average of the two "middle" values). I'm assumingdouble
as the value type, but this can obviously be anything. Here:Since popping is
O(log n)
and we have to popO(n / 2)
values, this isO(n log n)
.