找到每个点的最近点(最近邻)
我正在编写一个方法,该方法将点数组作为输入,并为数组中的每个点查找除自身之外最接近它的点。我目前正在以强力方式执行此操作(将每个点与其他点进行检查)。我当前的实现没有对数组进行排序,但我可以使用 CompareByX 方法按 px 值对其进行排序。我正在检查算法的运行时间,当 n 值很大时,它会非常耗时。我对这个主题不是很了解,对不同类型的数据结构知之甚少,任何简单的帮助都会很棒!
我当前的代码是:
import java.util.*;
import java.lang.*;
import java.io.*;
class My2dPoint {
double x;
double y;
public My2dPoint(double x1, double y1) {
x=x1;
y=y1;
}
}
class CompareByX implements Comparator<My2dPoint> {
public int compare(My2dPoint p1, My2dPoint p2) {
if (p1.x < p2.x) return -1;
if (p1.x == p2.x) return 0;
return 1;
}
}
/* An object of the above comparator class is used by java.util.Arrays.sort() in main to sort an array of points by x-coordinates */
class Auxiliaries {
public static double distSquared(My2dPoint p1, My2dPoint p2) {
double result;
result = (p1.x-p2.x)*(p1.x-p2.x) + (p1.y-p2.y)*(p1.y-p2.y);
return result;
}
}
public class HW3 {
public static void main (String argv []) throws IOException {
int range = 1000000; // Range of x and y coordinates in points
System.out.println("Enter the number of points");
InputStreamReader reader1 = new InputStreamReader(System.in);
BufferedReader buffer1 = new BufferedReader(reader1);
String npoints = buffer1.readLine();
int numpoints = Integer.parseInt(npoints);
// numpoints is now the number of points we wish to generate
My2dPoint inputpoints [] = new My2dPoint [numpoints];
// array to hold points
int closest [] = new int [numpoints];
// array to record soln; closest[i] is index of point closest to i'th
int px, py;
double dx, dy, dist;
int i,j;
double currbest;
int closestPointIndex;
long tStart, tEnd;
for (i = 0; i < numpoints; i++) {
px = (int) ( range * Math.random());
dx = (double) px;
py = (int) (range * Math.random());
dy = (double) py;
inputpoints[i] = new My2dPoint(dx, dy);
}
// array inputpoints has now been filled
tStart = System.currentTimeMillis();
// find closest [0]
closest[0] = 1;
currbest = Auxiliaries.distSquared(inputpoints[0],inputpoints[1]);
for (j = 2; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[0],inputpoints[j]);
if (dist < currbest) {
closest[0] = j;
currbest = dist;
}
}
// now find closest[i] for every other i
for (i = 1; i < numpoints; i++) {
closest[i] = 0;
currbest = Auxiliaries.distSquared(inputpoints[i],inputpoints[0]);
for (j = 1; j < i; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
for (j = i+1; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
}
tEnd = System.currentTimeMillis();
System.out.println("Time taken in Milliseconds: " + (tEnd - tStart));
}
}
I am writing a method that takes as input an array of points and finds, for each point in the array, the closest point to it other than itself. I am currently doing this in a brute force way (cheking every point with every other point). My current implimentation doesn't have the array sorted but i can sort it by p.x values with the CompareByX method. I am chekcking the running time of the algorithm, and it gets very time consuming with large values of n. I am not very knowledgable on this subject and know very littel about different types of data structures, any simple help would be great!
My current code is:
import java.util.*;
import java.lang.*;
import java.io.*;
class My2dPoint {
double x;
double y;
public My2dPoint(double x1, double y1) {
x=x1;
y=y1;
}
}
class CompareByX implements Comparator<My2dPoint> {
public int compare(My2dPoint p1, My2dPoint p2) {
if (p1.x < p2.x) return -1;
if (p1.x == p2.x) return 0;
return 1;
}
}
/* An object of the above comparator class is used by java.util.Arrays.sort() in main to sort an array of points by x-coordinates */
class Auxiliaries {
public static double distSquared(My2dPoint p1, My2dPoint p2) {
double result;
result = (p1.x-p2.x)*(p1.x-p2.x) + (p1.y-p2.y)*(p1.y-p2.y);
return result;
}
}
public class HW3 {
public static void main (String argv []) throws IOException {
int range = 1000000; // Range of x and y coordinates in points
System.out.println("Enter the number of points");
InputStreamReader reader1 = new InputStreamReader(System.in);
BufferedReader buffer1 = new BufferedReader(reader1);
String npoints = buffer1.readLine();
int numpoints = Integer.parseInt(npoints);
// numpoints is now the number of points we wish to generate
My2dPoint inputpoints [] = new My2dPoint [numpoints];
// array to hold points
int closest [] = new int [numpoints];
// array to record soln; closest[i] is index of point closest to i'th
int px, py;
double dx, dy, dist;
int i,j;
double currbest;
int closestPointIndex;
long tStart, tEnd;
for (i = 0; i < numpoints; i++) {
px = (int) ( range * Math.random());
dx = (double) px;
py = (int) (range * Math.random());
dy = (double) py;
inputpoints[i] = new My2dPoint(dx, dy);
}
// array inputpoints has now been filled
tStart = System.currentTimeMillis();
// find closest [0]
closest[0] = 1;
currbest = Auxiliaries.distSquared(inputpoints[0],inputpoints[1]);
for (j = 2; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[0],inputpoints[j]);
if (dist < currbest) {
closest[0] = j;
currbest = dist;
}
}
// now find closest[i] for every other i
for (i = 1; i < numpoints; i++) {
closest[i] = 0;
currbest = Auxiliaries.distSquared(inputpoints[i],inputpoints[0]);
for (j = 1; j < i; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
for (j = i+1; j < numpoints; j++) {
dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
if (dist < currbest) {
closest[i] = j;
currbest = dist;
}
}
}
tEnd = System.currentTimeMillis();
System.out.println("Time taken in Milliseconds: " + (tEnd - tStart));
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
最近邻搜索的强力搜索仅适用于少数点。
您可能想一般性地研究 kd 树或空间数据结构。
这是 kd-Tree 的演示。
这就是维基百科所说的。
Brute force for nearest neighbour search is only feasible for a small number of points.
You might want to look into kd-Trees or spatial data structures generally.
Here is a demo for kd-Tree.
This is what wikipedia says.
我肯定会先按 x 排序。然后,我将使用点之间的 x 距离作为快速拒绝测试:一旦获得到一个邻居的距离,任何较近的邻居都必须在 x 上更接近。这避免了对 x 范围之外的点的所有 distSquared 计算。每次找到更近的邻居时,您也会收紧需要搜索的 x 的范围。
另外,如果 P2 是 P1 的最近邻居,那么我将使用 P1 作为 P2 的最近邻居的初始猜测。
编辑:再想一想,我会按范围最大的维度进行排序。
I would definitely sort by x first. Then I would use the x distance between points as a quick reject test: once you have the distance to one neighbor, any closer neighbor has to be closer in x. This avoids all the distSquared computations for points outside the x range. Every time you find a closer neighbor, you also tighten up the range of x that you need to search.
Also, if P2 is the closest neighbor to P1, then I would use P1 as the initial guess for the closest neighbor to P2.
EDIT: On second thought, I'd sort by whichever dimension has the largest range.
有一些相当标准的方法可以改进这种搜索,您想要的复杂程度取决于您要搜索的点的数量。
一个相当常见的简单方法是按 X 或 Y 对点进行排序。然后,对于每个点,您可以在数组中向前和向后查找附近的点。记住您找到的最近的点有多远,当 X(或 Y)的差异大于该值时,您就知道不可能再找到更近的点了。
您还可以使用树来划分空间。维基百科有一个页面提供了一些可能的算法。有时,设置它们的成本比您节省的成本还要高。这是你必须根据你正在搜索的点的数量来决定的事情。
There are some fairly standard ways of improving this kind of search, and how complicated you want to get depends on how many points you are searching.
A fairly common easy one is to sort the points by X or Y. For each point you then look for near points, going both forwards and backwards in the array. Remember how far away the nearest one you have found is, and when the difference in X (or Y) is greater than that you know there can't be any nearer point left to find.
You can also partition your space using a tree. Wikipedia has a page that gives some possible algorithms. Sometimes the cost to set them up is larger than what you save. That's the sort of thing you have to decide based on how many points you are searching.
要么使用 kd 树,要么使用一个好的库进行最近邻搜索。 Weka 包括一个。
Either use a kd-tree, or use a good library for nearest neighbor search. Weka includes one.
另一种比创建 kd 树更简单的可能性是使用邻域矩阵。
首先将所有点放入二维方阵中。然后您可以运行完整或部分空间排序,因此点将在矩阵内排序。
Y 较小的点可以移动到矩阵的顶行,同样,Y 较大的点会移动到矩阵的底行。对于 X 坐标较小的点也会发生同样的情况,这些点应该移动到左侧的列。对称地,X 值较大的点将进入右侧的列。
完成空间排序后(有很多方法可以实现这一点,通过串行或并行算法),您可以通过仅访问点 P 实际存储在邻域矩阵中的相邻单元来查找给定点 P 的最近点。
您可以在以下论文中阅读有关此想法的更多详细信息(您可以在线找到其 PDF 副本):基于紧急行为的 GPU 上的超大群体模拟。
排序步骤为您提供了有趣的选择。您可以仅使用论文中描述的奇偶转置排序,这非常容易实现(甚至可以在 CUDA 中)。如果您只运行一次,它将为您提供部分排序,如果您的矩阵接近排序,这可能已经很有用。也就是说,如果你的点移动缓慢,它将节省你大量的计算。
如果需要完整排序,可以多次运行此类偶奇转置传递(如以下维基百科页面所述):
http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort
如果变化很小,一次或两次奇偶传递就足以获得数组再次排序。
Another possibility, simpler than creating a kd-tree, is using a neighborhood matrix.
First place all your points into a 2D square matrix. Then you can run a full or partial spatial sort, so points will became ordered inside the matrix.
Points with small Y could move to the top rows of the matrix, and likewise, points with large Y would go to the bottom rows. The same will happen with points with small X coordinates, that should move to the columns on the left. And symmetrically, points with large X value will go to the right columns.
After you did the spatial sort (there are many ways to achieve this, both by serial or parallel algorithms) you can lookup the nearest points of a given point P by just visiting the adjacent cells where point P is actually stored in the neighborhood matrix.
You can read more details for this idea in the following paper (you will find PDF copies of it online): Supermassive Crowd Simulation on GPU based on Emergent Behavior.
The sorting step gives you interesting choices. You can use just the even-odd transposition sort described in the paper, which is very simple to implement (maybe even in CUDA). If you run just one pass of this, it will give you a partial sort, which can be already useful if your matrix is near-sorted. That is, if your points move slowly, it will save you a lot of computation.
If you need a full sort, you can run such even-odd transposition pass several times (as described in the following Wikipedia page):
http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort
If the changes are small, one or two even-odd passes will suffice to get the array sorted again.
如果你的点相对较近,你可以按距某个点的距离排序(我认为它可以是任何点,但如果该点被视为起源)。
假设兴趣点是点 A,距离 D。
从排序列表中的点 A 中选择一些相对较小的 n 索引内的最近点(使用较大的 n 可能会提供更好的初始猜测,但会花费更长)。如果该点到 A 点的线性距离为 g,则您知道距离 A 最近的点最多为 g。这样您只需考虑列表中距离在 Dg 和 D+g 之间的点。
画个图表可能有助于理解。如果有人关心的话我会添加一个图表。
If your points are relatively close together, you can sort by distance from some point (I think it can be any point, but it may have to be a point for which all the points are in the same quadrant if that point is treated as the origin).
Lets say the point of interest is point A and has distance D.
Pick the closest point that is within some relatively small n indexes from the point A in the sorted list (using a larger n provides for a probably better initial guess, but will take longer). If that point has linear distance g from point A, you know that that the closest point has to be at most g from A. This way you only have to consider points in the list with distance between D-g and D+g.
Drawing out a chart might help to understand it. If anybody cares I'll add a diagram.