高效 SQL 地理通配符名称搜索 20mill+记录

发布于 2024-11-30 12:03:40 字数 1392 浏览 4 评论 0原文

我们有一个 SQL 2008 数据库，其中包含超过 2000 万个地理位置（并且还在不断增长），每个位置都包含标准名称/地址/地理位置/ID/等列。

我们需要一种方法来根据距离有效地搜索记录，但也需要通过全文索引使用“包含”关键字。基本思想是我们根据最大距离搜索我们附近的位置。

现在，当我们搜索 1 英里内的完整字符串（例如 StarBucks）时，搜索会在几秒钟内返回。然而，如果我们在 1 英里内搜索“星星”，搜索有时可能需要几分钟才能返回。

我们一直在玩弄这样的逻辑：

DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047, 
                                         -74.015086 40.718047, 
                                         -74.005086 40.718047, 
                                         -74.005086 40.734047, 
                                         -74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)

然而，这在搜索中引入了欺骗和其他问题。我们也一直在尝试使用我们在网上找到的 POWER 论坛。

我们还有其他查询效果很好，这些查询仅基于距离或特定类别 ID，这些查询在一秒内返回。最大的问题是通配符字符串匹配。

有没有人有一个很棒的 SQL 或 CLR 过程，在处理超过 2000 万条记录时接受名称（通配符支持）和距离？

现在我们陷入了困境:(

提前致谢，杰夫

原文

We have a SQL 2008 database with 20million+ geoWe locations (and growing) , each location includes the standard name/address/Geography/ID/Etc columns.

We need a way to efficiently search through the records based on a distance but also a "contains" keyword via the full text index. The basic idea is we search for locations near us based on a max distance.

Right now when we search for full strings such as StarBucks within 1 mile the search returns in a few seconds. However if we search for "star" within 1 mile the search can sometimes take several min to return.

We have been playing around logic such as this :

DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047, 
                                         -74.015086 40.718047, 
                                         -74.005086 40.718047, 
                                         -74.005086 40.734047, 
                                         -74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)

This introduces dupes and other issues in the search however. We have also been trying to use the POWER forumlas we have found online.

We have other queries working great which are only based on distance or a certain category ID, those return in a under a second. The big problem is the wildcard string matches.

Does anyone have a awesome SQL or CLR proc which accepts name (wildcard support) and distance when dealing with 20million+ records ?

Right now we are very stuck :(

thanks in advance,
Jeff

分享到QQ

分享到微博