高效 SQL 地理通配符名称搜索 20mill+记录

发布于 2024-11-30 12:03:40 字数 1392 浏览 4 评论 0原文

我们有一个 SQL 2008 数据库,其中包含超过 2000 万个地理位置(并且还在不断增长),每个位置都包含标准名称/地址/地理位置/ID/等列。

我们需要一种方法来根据距离有效地搜索记录,但也需要通过全文索引使用“包含”关键字。基本思想是我们根据最大距离搜索我们附近的位置。

现在,当我们搜索 1 英里内的完整字符串(例如 StarBucks)时,搜索会在几秒钟内返回。然而,如果我们在 1 英里内搜索“星星”,搜索有时可能需要几分钟才能返回。

我们一直在玩弄这样的逻辑:

DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047, 
                                         -74.015086 40.718047, 
                                         -74.005086 40.718047, 
                                         -74.005086 40.734047, 
                                         -74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)

然而,这在搜索中引入了欺骗和其他问题。我们也一直在尝试使用我们在网上找到的 POWER 论坛。

我们还有其他查询效果很好,这些查询仅基于距离或特定类别 ID,这些查询在一秒内返回。最大的问题是通配符字符串匹配。

有没有人有一个很棒的 SQL 或 CLR 过程,在处理超过 2000 万条记录时接受名称(通配符支持)和距离?

现在我们陷入了困境:(

提前致谢, 杰夫

We have a SQL 2008 database with 20million+ geoWe locations (and growing) , each location includes the standard name/address/Geography/ID/Etc columns.

We need a way to efficiently search through the records based on a distance but also a "contains" keyword via the full text index. The basic idea is we search for locations near us based on a max distance.

Right now when we search for full strings such as StarBucks within 1 mile the search returns in a few seconds. However if we search for "star" within 1 mile the search can sometimes take several min to return.

We have been playing around logic such as this :

DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047, 
                                         -74.015086 40.718047, 
                                         -74.005086 40.718047, 
                                         -74.005086 40.734047, 
                                         -74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)

This introduces dupes and other issues in the search however. We have also been trying to use the POWER forumlas we have found online.

We have other queries working great which are only based on distance or a certain category ID, those return in a under a second. The big problem is the wildcard string matches.

Does anyone have a awesome SQL or CLR proc which accepts name (wildcard support) and distance when dealing with 20million+ records ?

Right now we are very stuck :(

thanks in advance,
Jeff

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

若能看破又如何 2024-12-07 12:03:40

仅当您搜索 'star%' 并且该列具有索引时,简单的 LIKE 谓词才几乎有效。然而,随着数据集变大,性能将继续下降。如果您可以设置 Sql Server 的全文机制,情况会更好,这并不难,而且比索引 LIKE 快得多

Simple LIKE predicates will only work almost well if you are searching for 'star%' and that column has an index. Performance however will continue to degrade as the dataset gets larger. If you can set up Sql Server's Full-Text mechanism you will be better off, it's not hard and it is way faster than indexed LIKE's

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文