包络算法优化——放置圆的最佳位置
我必须以最佳方式解决以下问题。
输入数据为:
- 平面中的 N 个点,以 (x, y) 对整数坐标给出
- 。 同一平面中的 M 个点,以表示圆心的 (x, y) 对整数坐标给出。 所有这些圆的边缘都有 (0, 0)。
我需要找到一种方法来隔离具有以下属性的多个圆:对于任何“好”圆,前 N 个点中没有点位于所选圆中或所选圆的边缘。
点和圆的数量约为 100,000 个。 检查每个点的每个圆的明显解决方案的复杂度为 O(N * M),对于 100,000 个圆和 100,000 个点,在具有 64 位 SSE3 单精度代码的 Core 2 Duo 上大约需要 15 秒。 我与之竞争的参考实现在相同的数据下只需要大约 0.1 秒。 我知道参考实现是 O(Nlog N + Mlog M)。
我想通过以下方式优化我的算法。 制作点数据的 2 个副本,并分别根据 x 坐标和 y 坐标对副本进行排序。 然后仅使用位于由 [(xc - r, yc - r); 定义的正方形中的点; (xc + r, yc + r)],其中 (xc, yc) 是“当前”圆的中心,半径为 r。 我可以使用二分搜索找到该区间内的点,因为现在我处理排序数据。 这种方法的复杂度应该是 O(Nlog N + Mlog^2 N),实际上它更快,但仍然比参考慢得多。
我或多或少知道参考实现是如何工作的,但有一些步骤我不明白。 我将尝试解释到目前为止我所知道的内容:
坐标为 (Xc, Yc) 的圆的半径为:
- Rc = sqrt(Xc * Xc + Yc * Yc) (1)
这是因为 (0, 0) 在圆的边缘。
对于位于圆之外的点 P(x, y),必须满足以下不等式:
- sqrt((Xc - x)^2 + (Yc - y)^2) > Rc (2)
现在,如果我们将 (1) 中的 Rc 代入 (2),然后进行一些简单计算后,将不等式平方,我们得到:
- Yc < 1/2y * (x^2 + y^2) - Xc * x/y (3.1) 对于 y > 0Yc
- >0 1/2y * (x^2 + y^2) - Xc * x/y (3.2) 对于 y < 对于从输入数据中选择的任何 (x, y),对于给定圆 C(Xc, Yc), 0
(3.1) 和 (3.2) 必须成立。
为了简单起见,我们做一些符号:
- A(x, y) = 1/2y * (x^2 + y^2) (4.1)
- B(x, y) = -x/y (4.2)
- E(Xc) = 1/2y * (x^2 + y^2) - Xc * x/y = A(x, y) + Xc * B(x, y) (4.3)
我们可以看到,对于给定的圆 C(Xc ,Yc),我们可以将(3)写为:
- Yc < 对于 y > 的所有点,MIN(E(Xc)) (5.1) 0Yc
- >0 对于 y < 的所有点,MAX(E(Xc)) (5.2) 0
我们可以看到 E(Xc) 是关于 Xc 的线性函数,具有 2 个参数 - A(x, y) 和 B(x, y)。 这意味着基本上 E(Xc) 在欧几里得空间中表示具有 2 个参数的一系列线。
现在我不明白的部分来了。 他们说,由于上一段所述的属性,我们可以使用包络算法在 O(1) 摊销时间内而不是 O(N) 时间内计算 MIN() 和 MAX()。 我不知道信封算法如何工作。
关于如何实现信封算法有什么提示吗?
提前致谢!
编辑:
问题不在于数学意义上的信封是什么——我已经知道了。 问题是如何在比 O(n) 更好的时间内确定包络线,显然可以在摊销 O(1) 内完成。
我有计算包络线所需的函数系列,并且有一个包含所有可能参数的数组。 如何以最优方式解决最大化问题?
再次感谢!
I have to solve the following problem in an optimal way.
Input data is:
- N points in a plane given as a (x, y) pair of integer coordinates
- M points in the same plane given as a (x, y) pair of integer coordinates representing the center of a circle. All this circles have (0, 0) on their edge.
I need to find a way of isolating a number of circles that have the property than for any "good" circle no point from the first N points lays in the chosen circle or at the edge of the chosen circle.
The number of points and circles is in the order of 100,000. The obvious solution of checking every circle with every point has the complexity O(N * M) and with 100,000 circles and 100,000 points it takes about 15 seconds on a Core 2 Duo with 64 bit SSE3 single precision code. The reference implementation I compete against takes only about 0.1 seconds with the same data. I know the reference implementation is O(Nlog N + Mlog M).
I have thought of optimizing my algorithm in the following way. Make 2 copies of the point data and sort the copies in respect to x coordinate, respectively the y coordinate. Then use only points that lay in the square defined by [(xc - r, yc - r); (xc + r, yc + r)], where (xc, yc) is the center of the "current" circle, with radius r. I can find points in that interval using binary search because now I work with sorted data. The complexity of this approach should be O(Nlog N + Mlog^2 N), and indeed it is faster but still significantly slower than the reference.
I know more or less how the reference implementation works, but there are some steps that I don't understand. I will try to explain what I know so far:
The radius of a circle with coordinates (Xc, Yc) is:
- Rc = sqrt(Xc * Xc + Yc * Yc) (1)
That's because (0, 0) is on the edge of a circle.
For a point P(x, y) to be outside of a circle, the following inegality must be true:
- sqrt((Xc - x)^2 + (Yc - y)^2) > Rc (2)
Now if we substitute Rc from (1) into (2), then square the inegality after we make some simple calculations we get:
- Yc < 1/2y * (x^2 + y^2) - Xc * x/y (3.1) for y > 0
- Yc > 1/2y * (x^2 + y^2) - Xc * x/y (3.2) for y < 0
(3.1) and (3.2) must be true for a given circle C(Xc, Yc) for any (x, y) chosen from the input data.
For simplicity, let's make a few notations:
- A(x, y) = 1/2y * (x^2 + y^2) (4.1)
- B(x, y) = -x/y (4.2)
- E(Xc) = 1/2y * (x^2 + y^2) - Xc * x/y = A(x, y) + Xc * B(x, y) (4.3)
We can see that for a given circle C(Xc, Yc), we can write (3) as:
- Yc < MIN(E(Xc)) (5.1) for all points with y > 0
- Yc > MAX(E(Xc)) (5.2) for all points with y < 0
We can see that E(Xc) is a linear function in respect to Xc with 2 paramaters -- A(x, y) and B(x, y). That means that basically E(Xc) represents in the Euclidean space a family of lines with 2 parameters.
Now here comes the part I don't understand. They say that because of the property stated in the above paragraph we can calculate MIN() and MAX() in O(1) amortized time instead of O(N) time using an Envelope algorithm. I don't know how the Envelope algorithm might work.
Any hints on how to implement the Envelope algorithm?
Thanks in advance!
Edit:
The question is not about what an envelope in the mathematical sense is -- I already know that. The question is how to determine the envelope in better time then O(n), apparently it could be done in amortized O(1).
I have the family of functions I need to calculate the envelope, and I have an array of all posible parameters. How do I solve the maximization problem in an optimal way?
Thanks again!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这是有关信封的维基百科条目。 这是关于优化中的包络定理的教程。
Here is the Wikipedia entry on envelopes. Here is a tutorial about the envelope theorem in optimization.
我没有数学背景,但我会分三步解决这个问题:
丢弃 N 中的大多数点。这是棘手的部分。 从原点看时,每对点都会“阴影”其“后面”的区域。 该区域由从点向外出发、指向原点的两条光束以及点之间的圆交点界定。 在极坐标中,计算可能要简单得多。 从随机的一对开始,然后一次查看一个新点:如果它被遮挡,则将其扔掉;如果它被遮挡,则将其丢弃。 如果没有,请查明它是否遮挡了集合中已有的任何点,然后重建包络曲线集。 重建包络曲线部分的测试应该花费几乎恒定的时间,因为阴影集不太可能增长超过某个小数字。 最坏的情况似乎是 O(NlogN)。 我无法想象任何解决方案会比 O(N) 更好,因为在任何情况下你都必须查看每个点。
丢弃M中的大多数点。这很容易:如果M中的点距原点的距离超过到包络集最远点的距离的一半,则可以将其丢弃。 这需要 O(M)
通过实际检查过滤 M 中的剩余点。 这取决于 N 和 M 的分布需要多长时间,但我认为如果两个数字都很大且分布相似,则几乎 O(1)。
总的来说,O(N log(N) + M) 似乎是可能的。 但不能保证;)
I don't have the mathematical background, but I would approach this problem in three steps:
Throw away most points in N. This is the tricky part. Every pair of points "shadows" an area "behind" it when seen from the origin. This area is delimited by two beams starting from the points outward, pointing to the origin, and a circle intersection between the points. The calculation of this might be much simpler in polar coordinates. Start off with a random pair, then look at one new point at a time: if it is shadowed, throw it away; if not, find out if it shadows any point already in your set, then rebuild the enveloping set of curves. The tests for rebuilding the enveloping curve part should take almost constant time because the shadow set is unlikely to grow beyond a certain small number. The worst case for this seems to be O(NlogN). I cannot imagine any solution can be better than O(N) because you have to look at each point in any case.
Throw away most points in M. This is rather easy: if the point from M is farther from the origin than half the distance to the farthest point from the enveloping set, then it can be thrown out. This takes O(M)
Filter the remaining points in M through the actual check. It depends on the distribution of N and M how long this takes, but I think it is almost O(1) if both numbers are big and the distributions similar.
Overall, it seems to be possible in O(N log(N) + M). No guarantees, though ;)
我认为你可以用 Voronoi 图来做到这一点:
总体复杂度应该是O (N log N+M log N)
I think you can do it with Voronoi diagram:
Overall complexity should be O(N log N+M log N)
考虑计算的其他一些方面。
例如,您显然比较了很多距离。 每个都调用 SQRT。 为什么不比较“距离的平方”呢? SQRT 是一种代价高昂的计算。
Consider some other aspects of your computations.
For instance, you apparently compare a lot of distances. Each takes a call to SQRT. Why not compare the "squares of the distances" instead. SQRT is a costly computation.
切勿取开方。 将所有距离保留为它们的平方。 这样做,您也可以比较它们,但时间要少 2-3 倍。
Never take sqrt. Leave all distances as their squares. Doing so, you can compare them as well, but the time is 2-3-fold less.