如何扩展这个 SQL 查询来查找 k 个最近邻居？

发布于 2024-08-26 11:14:21 字数 985 浏览 6 评论 0原文

我有一个充满二维数据的数据库 - 地图上的点。每条记录都有一个几何类型的字段。我需要做的是将一个点传递给一个存储过程，该存储过程返回k个最近的点（k也将传递给存储过程，但这很容易）。我在 http://blogs 中找到了一个查询。 msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx 获取单个最近邻，但我不知道如何扩展它来查找k 最近的邻居。

这是当前查询 - T 是表格，g 是几何字段，@x 是要搜索的点，Numbers 是一个包含整数 1 到 n 的表：

DECLARE @start FLOAT = 1000; 
WITH NearestPoints AS
(
     SELECT TOP(1) WITH TIES *,  T.g.STDistance(@x) AS dist
     FROM Numbers JOIN T WITH(INDEX(spatial_index)) 
     ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
     ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist

内部查询选择最近的非空区域，然后外部查询选择该区域的顶部结果；外部查询可以轻松更改为（例如）SELECT TOP(20)，但如果最近的区域仅包含一个结果，则您将陷入困境。

我想我可能需要递归搜索包含 k 记录的第一个区域，但不使用表变量（这会导致维护问题，因为您必须创建表结构并且它很容易发生变化 - 那里有很多字段），我不明白如何。

原文

I have a database full of two-dimensional data - points on a map. Each record has a field of the geometry type. What I need to be able to do is pass a point to a stored procedure which returns the k nearest points (k would also be passed to the sproc, but that's easy). I've found a query at http://blogs.msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx which gets the single nearest neighbour, but I can't figure how to extend it to find the k nearest neighbours.

This is the current query - T is the table, g is the geometry field, @x is the point to search around, Numbers is a table with integers 1 to n:

DECLARE @start FLOAT = 1000; 
WITH NearestPoints AS
(
     SELECT TOP(1) WITH TIES *,  T.g.STDistance(@x) AS dist
     FROM Numbers JOIN T WITH(INDEX(spatial_index)) 
     ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
     ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist

The inner query selects the nearest non-empty region and the outer query then selects the top result from that region; the outer query can easily be changed to (e.g.) SELECT TOP(20), but if the nearest region only contains one result, you're stuck with that.

I figure I probably need to recursively search for the first region containing k records, but without using a table variable (which would cause maintenance problems as you have to create the table structure and it's liable to change - there're lots of fields), I can't see how.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱，才寂寞 2024-09-02 11:14:21

如果从内部查询中删除 TOP (1) WITH TIES，并将外部查询设置为返回前 k 行，会发生什么情况？

我也有兴趣知道这项修正案是否有帮助。它应该比使用 TOP 更有效：

DECLARE @start FLOAT = 1000
        ,@k INT = 20
        ,@p FLOAT = 2;

WITH NearestPoints AS
(
     SELECT *
            ,T.g.STDistance(@x) AS dist
            ,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(@x)) AS rn
     FROM Numbers 
     JOIN T WITH(INDEX(spatial_index)) 
     ON   T.g.STDistance(@x) <  @start*POWER(@p,Numbers.n)
     AND (Numbers.n - 1 = 0 
          OR T.g.STDistance(@x) >= @start*POWER(@p,Numbers.n - 1)
         )
)
SELECT * 
FROM NearestPoints
WHERE rn <= @k;

注意 - 未经测试 - 我无法在这里访问 SQL 2008。

What happens if you remove TOP (1) WITH TIES from the inner query, and set the outer query to return the top k rows?

I'd also be interested to know whether this amendment helps at all. It ought to be more efficient than using TOP:

DECLARE @start FLOAT = 1000
        ,@k INT = 20
        ,@p FLOAT = 2;

WITH NearestPoints AS
(
     SELECT *
            ,T.g.STDistance(@x) AS dist
            ,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(@x)) AS rn
     FROM Numbers 
     JOIN T WITH(INDEX(spatial_index)) 
     ON   T.g.STDistance(@x) <  @start*POWER(@p,Numbers.n)
     AND (Numbers.n - 1 = 0 
          OR T.g.STDistance(@x) >= @start*POWER(@p,Numbers.n - 1)
         )
)
SELECT * 
FROM NearestPoints
WHERE rn <= @k;

NB - untested - I don't have access to SQL 2008 here.

回复收藏 0 原文

我很坚强 2024-09-02 11:14:21

引自 Microsoft® SQL Server® 2008 内部：T-SQL 编程。第 14.8.4 节。

以下查询将返回 10
距离@input最近的兴趣点：

DECLARE @input GEOGRAPHY = 'POINT (-147 61)';
DECLARE @start FLOAT = 1000;
WITH NearestNeighbor AS(
  SELECT TOP 10 WITH TIES
    *, b.GEOG.STDistance(@input) AS dist
  FROM Nums n JOIN GeoNames b WITH(INDEX(geog_hhhh_16_sidx)) -- index hint
  ON b.GEOG.STDistance(@input) < @start*POWER(CAST(2 AS FLOAT),n.n)
  AND b.GEOG.STDistance(@input) >=
    CASE WHEN n = 1 THEN 0 ELSE @start*POWER(CAST(2 AS FLOAT),n.n-1) END
  WHERE n <= 20
  ORDER BY n
)
  SELECT TOP 10 geonameid, name, feature_code, admin1_code, dist
  FROM NearestNeighbor
  ORDER BY n, dist;

注意：仅此查询的 WHERE 的一部分
子句由空间支持
指数。然而，查询优化器
正确评估受支撑的部分
（“<”比较）使用索引。
这限制了行数
其中“>=”部分必须进行测试，
并且查询执行良好。改变
@start 的值有时可以
如果查询速度较慢，则加快查询速度
超出预期。

清单 2-1。创建和填充辅助数字表

SET NOCOUNT ON;
USE InsideTSQL2008;

IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL DROP TABLE dbo.Nums;

CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
DECLARE @max AS INT, @rc AS INT;
SET @max = 1000000;
SET @rc = 1;

INSERT INTO Nums VALUES(1);
WHILE @rc * 2 <= @max
BEGIN
  INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums;
  SET @rc = @rc * 2;
END

INSERT INTO dbo.Nums
  SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max;

Quoted from Inside Microsoft® SQL Server® 2008: T-SQL Programming. Section 14.8.4.

The following query will return the 10
points of interest nearest to @input:

DECLARE @input GEOGRAPHY = 'POINT (-147 61)';
DECLARE @start FLOAT = 1000;
WITH NearestNeighbor AS(
  SELECT TOP 10 WITH TIES
    *, b.GEOG.STDistance(@input) AS dist
  FROM Nums n JOIN GeoNames b WITH(INDEX(geog_hhhh_16_sidx)) -- index hint
  ON b.GEOG.STDistance(@input) < @start*POWER(CAST(2 AS FLOAT),n.n)
  AND b.GEOG.STDistance(@input) >=
    CASE WHEN n = 1 THEN 0 ELSE @start*POWER(CAST(2 AS FLOAT),n.n-1) END
  WHERE n <= 20
  ORDER BY n
)
  SELECT TOP 10 geonameid, name, feature_code, admin1_code, dist
  FROM NearestNeighbor
  ORDER BY n, dist;

Note: Only part of this query’s WHERE
clause is supported by the spatial
index. However, the query optimizer
correctly evaluates the supported part
(the "<" comparison) using the index.
This restricts the number of rows for
which the ">=" part must be tested,
and the query performs well. Changing
the value of @start can sometimes
speed up the query if it is slower
than desired.

Listing 2-1. Creating and Populating Auxiliary Table of Numbers

SET NOCOUNT ON;
USE InsideTSQL2008;

IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL DROP TABLE dbo.Nums;

CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
DECLARE @max AS INT, @rc AS INT;
SET @max = 1000000;
SET @rc = 1;

INSERT INTO Nums VALUES(1);
WHILE @rc * 2 <= @max
BEGIN
  INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums;
  SET @rc = @rc * 2;
END

INSERT INTO dbo.Nums
  SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max;

回复收藏 0 原文

~没有更多了~