高效 SQL 地理通配符名称搜索 20mill+记录
我们有一个 SQL 2008 数据库,其中包含超过 2000 万个地理位置(并且还在不断增长),每个位置都包含标准名称/地址/地理位置/ID/等列。
我们需要一种方法来根据距离有效地搜索记录,但也需要通过全文索引使用“包含”关键字。基本思想是我们根据最大距离搜索我们附近的位置。
现在,当我们搜索 1 英里内的完整字符串(例如 StarBucks)时,搜索会在几秒钟内返回。然而,如果我们在 1 英里内搜索“星星”,搜索有时可能需要几分钟才能返回。
我们一直在玩弄这样的逻辑:
DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047,
-74.015086 40.718047,
-74.005086 40.718047,
-74.005086 40.734047,
-74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)
然而,这在搜索中引入了欺骗和其他问题。我们也一直在尝试使用我们在网上找到的 POWER 论坛。
我们还有其他查询效果很好,这些查询仅基于距离或特定类别 ID,这些查询在一秒内返回。最大的问题是通配符字符串匹配。
有没有人有一个很棒的 SQL 或 CLR 过程,在处理超过 2000 万条记录时接受名称(通配符支持)和距离?
现在我们陷入了困境:(
提前致谢, 杰夫
We have a SQL 2008 database with 20million+ geoWe locations (and growing) , each location includes the standard name/address/Geography/ID/Etc columns.
We need a way to efficiently search through the records based on a distance but also a "contains" keyword via the full text index. The basic idea is we search for locations near us based on a max distance.
Right now when we search for full strings such as StarBucks within 1 mile the search returns in a few seconds. However if we search for "star" within 1 mile the search can sometimes take several min to return.
We have been playing around logic such as this :
DECLARE @geoSearchLocation GEOGRAPHY, @geoSearchPolygon GEOGRAPHY, @returncount smallint = 50
SET @geoSearchLocation = geography::Point(40.729047, -74.010086, 4326); --NYC
SET @geoSearchPolygon = geography::STGeomFromText('POLYGON((-74.015086 40.734047,
-74.015086 40.718047,
-74.005086 40.718047,
-74.005086 40.734047,
-74.015086 40.734047))', 4326);
SET @geoSearchLocation = geography::Point(40.729047, -73.010086, 4326);
SELECT TOP (100) --WITH TIES
*, gt.LocationGeog.STDistance(@geoSearchLocation) AS dist
FROM dbo.GeoLocation_Locations gt WITH (NOLOCK, INDEX(geolocation_HHHH128_sidx))
WHERE gt.LocationGeog.STIntersects(@geoSearchPolygon) = 1
ORDER BY gt.LocationGeog.STDistance(@geoSearchLocation)
This introduces dupes and other issues in the search however. We have also been trying to use the POWER forumlas we have found online.
We have other queries working great which are only based on distance or a certain category ID, those return in a under a second. The big problem is the wildcard string matches.
Does anyone have a awesome SQL or CLR proc which accepts name (wildcard support) and distance when dealing with 20million+ records ?
Right now we are very stuck :(
thanks in advance,
Jeff
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
仅当您搜索
'star%'
并且该列具有索引时,简单的LIKE
谓词才几乎有效。然而,随着数据集变大,性能将继续下降。如果您可以设置 Sql Server 的全文机制,情况会更好,这并不难,而且比索引LIKE
快得多Simple
LIKE
predicates will only work almost well if you are searching for'star%'
and that column has an index. Performance however will continue to degrade as the dataset gets larger. If you can set up Sql Server's Full-Text mechanism you will be better off, it's not hard and it is way faster than indexedLIKE
's