面试中的动态规划算法
这个问题是面试时问我的,尴尬地暴露了我在动态规划方面的不足。如果有人能帮助我破解这个问题,我将不胜感激。另外,如果你能在设计解决方案时解释你的思维过程,这对我(和其他人)会非常有帮助,因为当我看到一个使用动态编程范式但很难实现的解决方案时,我似乎能够理解和我自己的一样。
话不多说,这就是我被问到的问题。
给定一个整数i
并设置X
的k
点x1
,x2
,。 .. xk
在实线上,从集合X
中选择i
个点,使得到中各点的距离之和最小使用动态规划从 X
到 i
中的一点。
This question was asked to me in an interview and it embarrassingly exposed my shortcomings on dynamic programming. I will appreciate if someone can help me crack this one. Also, it would be very helpful to me (and others) if you can explain your thinking process along the way as you devise the solution as i seem to be able to understand when i see a solution which uses dynamic programming paradigm but struggle to come up with my own.
Without further ado, here is the question i was asked.
Given an integer i
and set X
of k
points x1
, x2
, ... xk
on real line, select i
points from set X
so as to minimize the sum of the distance from every point in X
to a point in i
using Dynamic programming.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于大多数 DP 问题,我尝试找到一种归约关系。也就是说,我可以通过每一步来减少问题的大小(就像分而治之,但通常不会划分问题,它只是删除一小部分)。在这个问题中(像许多其他问题一样),我们可以做一个非常简单的观察:第一个点要么在
i
点集中,要么不在。一些符号:假设 X = {x1, x2, ..., xk},并表示约简集 X< sub>n = {xn, xn+1, ..., xk}。
因此观察结果是 x1 要么是
i
点之一,要么不是。我们将i
集查找函数称为 MSD(i
,Xk)(最小距离总和)。我们可以将剖面观察表示如下:MSD(
i
,Xk) = Either MSD(i-1
,Xk-1) U {x1} 或 MSD(i
,Xk-1)我们可以形式化“要么或”部分通过实现一种简单的方法来检查这两个选项中的哪一个实际上是是:我们遍历集合 X 并计算距离之和,并检查哪个实际上更小。此时我们注意到,该检查的运行时间为
ki
,因为我们将天真地运行每个k
点并获取距集合中点的最小距离大小为i
。我们对基本情况进行两个简单的观察:
MSD(
i
,Xi) = XiMSD(
0
,Xn) = {}第一个是在大小为
i< 的集合中查找
i
点时/code> 显然我们只是拿整套。第二个是,当在集合中查找没有点时,我们返回空集。这归纳地确保了 MSD 返回大小为
i
的集合(对于i=0
的情况是正确的,并且根据我们上面对 MSD 的定义归纳为真)。就是这样。这将找到合适的集合。
运行时复杂度的上限为
O(ik * step)
,其中step是我们上面的O(ik)
检查。这是因为 MSD 将在0-i
和 X1 - Xk 范围内的参数上运行,总共>ik
可能的参数。这使得我们的运行时间为 O((ik)2)。
以下部分是基于我对OP问题的理解。我不确定 X 中每个点与 i 大小子集的距离是否是子集中每个点与每个其他点的距离之和,或者是X 中的每个点来自子集本身。
即 X 中 x 的西格玛(x 与子集中每个点的距离之和)或 X 中 x 的西格玛(x 与子集的距离,即从 x 到子集中任何点的最小距离)
我假设后者。
我们可以通过优化上面的
O(ik)
检查来减少运行时间。我们注意到元素实际上是排序的(尽管在当前表示法中是相反的顺序),因为当我们添加它们时,我们总是从右侧进行添加。假设他们一开始就被分类,他们将不再接受 MSD 常规检查。如果它们一开始就没有排序,我们可以对它们进行排序,无论如何,这只会花费O(klogk)
。排序后,检查每个点与集合中某个点的距离将是 k * logi ,因为对于每个点我们都会进行二分搜索。总运行时间为
O(ik * klogi + klogk)
= O(k2 * ilogi)。
最后,我们可以将其表示为 O(k3logk)。不是最快的解决方案,而是解决方案。
我确信还有更多的优化,但这就是我的 2c。
With most DP problems I try and find a kind of reduce-and-conquer relation. That is, a relation whereby I can cut away from the problem size with each step (like divide and conquer, but usually doesn't divide the problem, it just removes a small part). In this problem (like many others) we can make a very simple observation: Either the first point is in the set of
i
points, or it isn't.Some notation: Let's say X = {x1, x2, ..., xk}, and denote the reduced set Xn = {xn, xn+1, ..., xk}.
So the observation is that either x1 is one of the
i
points, or it isn't. Let's call ouri
-set finding function MSD(i
,Xk) (minimum sum of distances). We can express that cut-away observation as follows:MSD(
i
,Xk) = Either MSD(i-1
,Xk-1) U {x1} or MSD(i
,Xk-1)We can formalise the "either or" part by realising a simple way of checking which of those two options it actually is: We run through the set X and calculate the sum of the distances, and check which is actually the smaller. We note at this point, that that check has a running time of
ki
since we will naively run through each of thek
points and grab the minimum distance from points in the set of sizei
.We make two simple observations regarding base cases:
MSD(
i
,Xi) = XiMSD(
0
,Xn) = {}The first is that when looking for
i
points in a set of sizei
we obviously just take the whole set.The second is that when looking for no points in a set, we return the empty set. This inductively ensures that MSD returns sets of size
i
(it's true for the case wherei=0
and by induction is true according to our definition of MSD above).That's it. That will find the appropriate set.
Runtime complexity is upper bounded by
O(ik * step)
where step is ourO(ik)
check from above. This is because MSD will be run on parameters that range from0-i
and X1 - Xk, which is a total ofik
possible arguments.That leaves us with a runtime of O((ik)2).
The following part is based on my understanding of the OP's question. I'm not sure if the distance of every point in X from the i-sized subset is the sum of the distances of every point from every other point in the subset, or the sum of the distances of every point in X from the subset itself.
I.e. sigma of x in X of (sum of distances of x from every point in the subset) OR sigma of x in X of (distance of x from the subset which is the minimum distance from x to any point in the subset)
I assume the latter.
We can reduce the runtime by optimising the
O(ik)
check from above. We notice that the elements are actually sorted (albeit in reverse order in this current notation), since when we add them on we do so always from the right hand side. Assuming they're sorted to begin with, they will be once out of the MSD routine. If they weren't sorted to begin with we can sort them, which will only costO(klogk)
anyway.Once sorted, checking the distance of each point from a point in the set will be
k * logi
since for each point we do a binary search. This yields a total running time ofO(ik * klogi + klogk)
= O(k2 * ilogi).
Finally, we can express that as O(k3logk). Not the fastest solution, but a solution.
I'm sure there are even more optimisations, but that's my 2c.