两个大小为 n 的数据库中的第 n 个最小数,每个数据库都使用分治法
我们有两个大小为 n 的数据库,其中包含不重复的数字。所以,我们总共有 2n 个元素。可以通过一次查询一个数据库来访问它们。该查询是这样的,您给它 ak ,它会返回该数据库中第 k 个最小的条目。我们需要在 O(log n) 次查询中找到所有 2n 个元素中的第 n 个最小条目。
这个想法是使用分而治之的方法,但我需要一些帮助来思考这个问题。谢谢!
We have two databases of size n containing numbers without repeats. So, in total we have 2n elements. They can be accessed through a query to one database at a time. The query is such that you give it a k and it returns kth smallest entry in that database. We need to find nth smallest entry among all the 2n elements in O(log n) queries.
The idea is to use divide and conquer but I need some help thinking through this. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
前不久在资格考试中看到这道题,闻起来有家庭作业的味道。因此,我只会提出两个建议:
研究二分搜索,并仔细注意循环不变量的精确性质。 Jon Bentley 的书 Programming Pearls 有很好的解释
尝试概括二分搜索的不变量。
画出各种特殊情况的图:
这是一个相当难的问题;如果你直接去证明正确性,你会更容易。
I saw this problem on a qualifying exam not long ago, and it smells like a homework problem. I will therefore make only two suggestions:
Study binary search, with careful attention to the precise nature of the loop invariants. Jon Bentley's book Programming Pearls has a good explanation
Try generalizing the invariants for binary search.
Draw a picture with various special cases:
This is a fairly hard problem; you'll have an easier time if you go straight for a proof of correctness.
这就是我的想法。由于这是一个教育问题,如果其中任何一点让您认为“啊哈,我没有想到这一点,我可以自己沿着这些思路进一步思考”,我建议您停止阅读。
1) 与 sdcwc 相同的观察:数据库可以被认为是一个排序数组,对于它是从 0 还是 1 开始可能有一点争议。我要求元素 0,我得到最小的。我要 12 个,我得到第 13 个最小的。等等。我发现这更容易想象。
2)我们知道我们正在寻找一个 O(log n) 算法。这意味着,粗略地说,我们正在寻找以下两件事之一:
要么我们从计算最小元素开始,然后计算第二小的元素、第四最小的元素、第八小的元素等,直到达到我们想要的大小。想要,每一步都在恒定的时间内。这对我来说听起来不太有希望。
或者我们从一个大小为 n 的问题开始,然后执行一些恒定时间操作,这使我们能够通过解决大小为 n/2 的问题来解决原始问题。显然我们可以在常数时间内解决n=1的问题,从而完成算法。这听起来似乎更合理一些。
实际上不一定每次都是n/2。它可以是 n/3,或者 999*n/1000,结果仍然是 O(log n)。但先寻找 n/2 也没有什么坏处。
3)我们如何减少这样的问题?好吧,如果我们可以从一个数组或另一个数组的开头处折扣/删除 m 个元素,因为它们小于第 k 个最小的元素,那么我们可以找到所得数组对中第 (km) 个最小的元素,它将是原始数组中第 k 个最小的数组。
4) 最后,突破性的观察是,如果数组 A 的第 m 个最小元素小于数组 B 的第 m 个最小元素,则 A 的第 m 个元素不可能是两个数组组合中的第 (2m) 个最小元素。它比这个小(或者具有相同的值:我不确定“无重复”是否意味着“每个数据库中没有重复”,或者“合并的数据库之间没有重复”),因为最多有 2*(m- 1) 两个数组中的元素相加后严格小于它。
除非我犯了错误,否则剩下的就是编码了。当 k 为奇数时,使用一个小的额外参数来解释 off-by-1,该解决方案实际上是 O(log k),即 O(log n),因为 k <= 2*n。
Here's how I thought this through. Since it's an educational problem, I suggest you stop reading if any of the points makes you think, "aha, I hadn't thought of that, I can think further along those lines by myself".
1) Same observation as sdcwc: with a possible slight quibble over whether it starts at 0 or 1, the database can be thought of as a sorted array. I ask for element 0, I get the smallest. I ask for 12, I get the 13th smallest. And so on. I find this easier to visualise.
2) We know we're looking for an O(log n) algorithm. That means that roughly speaking we're looking for one of two things:
Either we start out by calculating the smallest element, then calculate the 2nd smallest, 4th smallest, 8th, etc, until we're up to the size we want, each step in constant time. This doesn't sound promising to me.
Or we start out with a problem of size n then we perform some constant-time operation which allows us to solve the original problem by solving a problem of size n/2. Obviously we can solve the problem for n=1 in constant time, to complete the algorithm. This sounds a bit more plausible.
Actually it doesn't necessarily have to be n/2 each time. It could be n/3, or 999*n/1000, and the result will still be O(log n). But there's no harm in looking for n/2 first.
3) How are we going to reduce the problem like that? Well, if we can discount/remove m elements from the start of one array or the other as being smaller than the kth smallest, then we can find the (k-m)th smallest element of the resulting pair of arrays, and it will be the kth smallest of the original arrays.
4) Finally, the breakthrough observation is that if the mth smallest element of array A is smaller than the mth smallest element of array B, then that mth element of A cannot possibly be the (2m)th smallest element of the two arrays combined. It's smaller than that (or of equal value: I'm not sure whether "no repeats" means "no repeats in each database", or "no repeats between the databases combined"), since there are at most 2*(m-1) elements strictly smaller than it in both arrays combined.
Unless I've made an error, the rest is coding. With a small extra argument to account for the off-by-1 when k is odd, this solution is actually O(log k), which is O(log n) since k <= 2*n.
提示:
Tips: