Oracle中点数据表无索引的最近邻查询的pl/sql代码
我正在尝试构建一个程序来获取具有选定 ID 的点的 k 个最近邻点。 我需要在不使用任何空间定位器功能(例如 sdo_geometry 或 nn)的情况下执行此操作。
基本上我在oracle中有一个表,其中包含ID、Data_X、Data_Y。假设我的表中有 10 个条目,并且我需要距离虚构点 target_x、target_y 最近的 3 个点。
我们需要计算表中每个点与我给定的虚构点的欧几里德距离。我只是不知道 pl/sql 中的算法可以返回最近的邻居 id。
i am trying to build a procedure to obtain k nearest neighbor points to a point with a selected ID.
I need to do this without using any spatial locator features like sdo_geometry or nn.
basically i have a table in oracle with ID, Data_X, Data_Y. let's say i have 10 entries in my table, and i need the 3 nearest points to a fictional point target_x, target_y.
we would need to calculate the euclidean distance of each point in the table with my given fictional point. I just dont know an algorithm in pl/sql that would return me the nearest neighbor ids.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
南的回答是一个很好的起点,如果它能完成工作,我会使用它。不幸的是,它很可能需要全表扫描(或者可能需要全索引扫描,如果您有覆盖索引)。
如果性能成为问题,您可能可以考虑对数据创建穷人的空间索引。它不会像“创建索引”那么简单,但它可能会起作用。
正确的方法是创建自定义索引,但这只是重新发明 sdo_geometry 轮子,这是您说希望避免的路径。
一个简单但粗略的方法(免责声明:这只是我的一个想法,未经测试)可能是创建一个基于函数的索引,将 2D 空间中的所有点分组为正方形块。您基本上创建一个索引来将每个 (x,y) 对映射到块列表上。每个块都有定义的宽度和高度,要进行搜索,您首先要计算出需要搜索哪个块网格,然后仅查询该网格中的点列表。
索引示例如下:
您用什么值替换 100 将取决于您的点所取值的范围。你会想要将平面划分为大量的网格块,以便索引具有合理的选择性;但又不会太大,以至于典型的查询必须搜索太多块才能找到候选者。
您可以通过使用这样的查询来使用上面的索引:
然后您可以设置 :threshold 来基本上从查询中消除大量点块。我认为如果函数索引的值(即 100)和阈值设置正确,您将看到查询使用基于函数的索引来获取一小组候选者,而不是计算中每个点的距离桌子。
缺点是如果 :threshold 太低,查询可能不返回任何行。另一方面,这可能是一个有用的功能,具体取决于您的需求。
nang's answer is a great starting point, and if it does the job, I'd use it. Unfortunately it will most probably require a full table scan (or perhaps a full index scan, if you have a covering index).
If performance becomes a problem, you could probably have a look at making a poor-man's spatial index over the data. It won't be as simple as "create index" but it just might work.
The proper method would be to create a custom index, but that would just be reinventing the sdo_geometry wheel, a path you said you wish to avoid.
A simple-but-rough method (disclaimer: this is just an idea off the top of my head, not tested) might be to create a function-based index that groups all the points in 2D space into square blocks. You basically create an index to map each (x,y) pair onto a list of blocks. Each block would have a defined width and height, and to do a search you would first work out which grid of blocks need to be searched, then query across just the list of points in that grid.
An example index would be something like:
What value you substitute for 100 will depend on the range of values your points take. You will want to divide the plane into a large number of grid blocks, so that the index is reasonably selective; but not so large that a typical query will have to search too many blocks to find candidates.
You could use the index above by using a query like this:
You can then set :threshold to basically eliminate a large set of blocks of points from the query. I reckon if the values for the functional index (i.e. 100) and the threshold are set correctly, you'll see the query use the function-based index to get a small set of candidates, instead of calculating the distance for every single point in the table.
The downside is that if :threshold is too low, the query might return no rows. On the other hand, that might be a useful feature, depending on your needs.
计算每个点与所选点之间的距离(毕达哥拉斯),并按距离排序。
伪sql:
前3行是最近的3个点。
Calculate the distance (Pythagoras) between each point and the selected point and order by the distance.
Pseudo sql:
The first 3 rows are the the nearest 3 points.