Oracle中点数据表无索引的最近邻查询的pl/sql代码

发布于 2024-10-29 04:08:47 字数 258 浏览 1 评论 0原文

我正在尝试构建一个程序来获取具有选定 ID 的点的 k 个最近邻点。 我需要在不使用任何空间定位器功能(例如 sdo_geometry 或 nn)的情况下执行此操作。

基本上我在oracle中有一个表,其中包含ID、Data_X、Data_Y。假设我的表中有 10 个条目,并且我需要距离虚构点 target_x、target_y 最近的 3 个点。

我们需要计算表中每个点与我给定的虚构点的欧几里德距离。我只是不知道 pl/sql 中的算法可以返回最近的邻居 id。

i am trying to build a procedure to obtain k nearest neighbor points to a point with a selected ID.
I need to do this without using any spatial locator features like sdo_geometry or nn.

basically i have a table in oracle with ID, Data_X, Data_Y. let's say i have 10 entries in my table, and i need the 3 nearest points to a fictional point target_x, target_y.

we would need to calculate the euclidean distance of each point in the table with my given fictional point. I just dont know an algorithm in pl/sql that would return me the nearest neighbor ids.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

原来是傀儡 2024-11-05 04:08:59

南的回答是一个很好的起点,如果它能完成工作,我会使用它。不幸的是,它很可能需要全表扫描(或者可能需要全索引扫描,如果您有覆盖索引)。

如果性能成为问题,您可能可以考虑对数据创建穷人的空间索引。它不会像“创建索引”那么简单,但它可能会起作用。

正确的方法是创建自定义索引,但这只是重新发明 sdo_geometry 轮子,这是您说希望避免的路径。

一个简单但粗略的方法(免责声明:这只是我的一个想法,未经测试)可能是创建一个基于函数的索引,将 2D 空间中的所有点分组为正方形块。您基本上创建一个索引来将每个 (x,y) 对映射到块列表上。每个块都有定义的宽度和高度,要进行搜索,您首先要计算出需要搜索哪个块网格,然后仅查询该网格中的点列表。

索引示例如下:

CREATE INDEX grid_block_i ON points (TRUNC(Data_X/100), TRUNC(Data_Y/100), id);

您用什么值替换 100 将取决于您的点所取值的范围。你会想要将平面划分为大量的网格块,以便索引具有合理的选择性;但又不会太大,以至于典型的查询必须搜索太多块才能找到候选者。

您可以通过使用这样的查询来使用上面的索引:

select id
from (select id, Data_X, Data_Y
      from points
      where TRUNC(Data_X/100) BETWEEN TRUNC(:target_x/100)) - :threshold
                                  AND TRUNC(:target_x/100)) + :threshold
      and   TRUNC(Data_Y/100) BETWEEN TRUNC(:target_y/100)) - :threshold
                                  AND TRUNC(:target_y/100)) + :threshold
     )
order by sqrt(sqr(Data_x - :target_x) + sqr(Data_y - :target_y))

然后您可以设置 :threshold 来基本上从查询中消除大量点块。我认为如果函数索引的值(即 100)和阈值设置正确,您将看到查询使用基于函数的索引来获取一小组候选者,而不是计算中每个点的距离桌子。

缺点是如果 :threshold 太低,查询可能不返回任何行。另一方面,这可能是一个有用的功能,具体取决于您的需求。

nang's answer is a great starting point, and if it does the job, I'd use it. Unfortunately it will most probably require a full table scan (or perhaps a full index scan, if you have a covering index).

If performance becomes a problem, you could probably have a look at making a poor-man's spatial index over the data. It won't be as simple as "create index" but it just might work.

The proper method would be to create a custom index, but that would just be reinventing the sdo_geometry wheel, a path you said you wish to avoid.

A simple-but-rough method (disclaimer: this is just an idea off the top of my head, not tested) might be to create a function-based index that groups all the points in 2D space into square blocks. You basically create an index to map each (x,y) pair onto a list of blocks. Each block would have a defined width and height, and to do a search you would first work out which grid of blocks need to be searched, then query across just the list of points in that grid.

An example index would be something like:

CREATE INDEX grid_block_i ON points (TRUNC(Data_X/100), TRUNC(Data_Y/100), id);

What value you substitute for 100 will depend on the range of values your points take. You will want to divide the plane into a large number of grid blocks, so that the index is reasonably selective; but not so large that a typical query will have to search too many blocks to find candidates.

You could use the index above by using a query like this:

select id
from (select id, Data_X, Data_Y
      from points
      where TRUNC(Data_X/100) BETWEEN TRUNC(:target_x/100)) - :threshold
                                  AND TRUNC(:target_x/100)) + :threshold
      and   TRUNC(Data_Y/100) BETWEEN TRUNC(:target_y/100)) - :threshold
                                  AND TRUNC(:target_y/100)) + :threshold
     )
order by sqrt(sqr(Data_x - :target_x) + sqr(Data_y - :target_y))

You can then set :threshold to basically eliminate a large set of blocks of points from the query. I reckon if the values for the functional index (i.e. 100) and the threshold are set correctly, you'll see the query use the function-based index to get a small set of candidates, instead of calculating the distance for every single point in the table.

The downside is that if :threshold is too low, the query might return no rows. On the other hand, that might be a useful feature, depending on your needs.

無處可尋 2024-11-05 04:08:57

计算每个点与所选点之间的距离(毕达哥拉斯),并按距离排序。
伪sql:

select id from points
order by sqrt(sqr(Data_x - target_x) + sqr(Data_y - target_y)) 

前3行是最近的3个点。

Calculate the distance (Pythagoras) between each point and the selected point and order by the distance.
Pseudo sql:

select id from points
order by sqrt(sqr(Data_x - target_x) + sqr(Data_y - target_y)) 

The first 3 rows are the the nearest 3 points.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文