SQL Server 2008 空间数据功能对于映射查询有用吗?
我有一个应用程序,基本上我有一个巨大的信息表(1 亿条记录),每行都包含一个纬度/经度值。
我不断查询该表以获取适合某个点周围半径范围内的所有记录。例如,“39.89288,-104.919434 5 英里范围内的所有记录”
为此,我在纬度/经度列上有一个索引,并且我获取点的“边界平方”,然后丢弃所有落下的点在我的 ASP.Net 应用程序中的圆之外,因为这比在 SQL Server 中进行圆计算更快。
注意:这是有关美国的所有数据,因此我认为地球对于我的计算来说是平坦的,这对于我的需要来说足够精确。
现在,纬度/经度索引的主要问题是作为点的“正方形”,并且由于我试图找到“x和y之间的纬度”和“x和y之间的经度”,所以它实际上不能超级高效地使用索引,就像我在“一行”点上搜索一样。
我一直在阅读 SQL 2008 的空间功能,但我还没有找到足够的具体信息来知道这对我是否有用。
所以问题是:SQL 2008 是否有某种不同类型的索引,可以使这种特定类型的查询比使用 SQL 2005 快得多?
I have an app where basically I have a huge table (100 million records) of information, with each row containing a lat/long value.
I'm constantly querying this table to get all the records that fit within a radius around a certain point. For example, "all records within 5 miles of 39.89288,-104.919434"
For this, I have an index over the Lat/Long columns, and I'm getting the "bounding square" of points, and then discarding all the points that fall outside of the circle in my ASP.Net app, since that was faster than doing the circle calculation in SQL Server.
NOTE: This is all data about the US, so I'm considering the earth to be flat for my calculations, which is precise enough for my needs.
Now, the main problem with the Lat/Long index is that being "a square" of points, and since i'm trying to find "Lat between x and y" and "Long between x and y", it can't really use the index super-efficiently, as it could if I were searching over "a line" of points.
I've been reading up on SQL 2008's spatial features, but I haven't found enough concrete information to know whether this is useful for me.
So the question is: Does SQL 2008 have some kind of different type of index that will make this specific type of query much faster than I can with SQL 2005?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的!请查看关于空间索引的这篇文章。您会发现这些类型的索引比“索引矩形”方法效果更好。此外,您不仅能够有效地查询“是否靠近另一点的点”,还可以进行所有其他类型的地理操作。 这里该类型的所有可用方法的完整列表。
Yes! Check out this article about spatial indexes. You'll see that these types of indexes work better than the "indexed rectangle" approach. Besides, not only will you be able to effectively query for "is point near another point", but do all other sorts of geographical operations. Here's a complete list of all available methods on the type.
发现这个:
对于 SQL 2008:
http://blogs. lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th
显然这是可能的
注意:该文章的 SQL 2005 版本的性能不太好。我已经尝试过这类事情,最好从 SQL Server 获取正方形,然后在您自己的代码中剔除圆圈。
更多感兴趣的链接:
http://msdn.microsoft.com/en-us /library/bb964712.aspx
(最后一个解释!!)
还有一个示例查询......
显然,这就是如何进行我想要的搜索(5英里圆圈内的点):(
LocGeog是地理列)
但奇怪的是,这比我常规的旧查询运行得慢(慢7倍),所以我显然仍然在做一些非常错误的事情。
Found this:
For SQL 2008:
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th
Apparently it IS possible
NOTE: The SQL 2005 version of that article doesn't perform too well. I've tried that kind of things and it works better to just get the square from SQL Server, and then cull the circle in your own code.
More links of interest:
http://msdn.microsoft.com/en-us/library/bb964712.aspx
(Finally an explanation!!)
And a sample query...
Apparently this is how to do the search I want (points within a 5-mile circle):
(LocGeog is the Geography column)
Strangely, though, this runs WAY slower than my regular old query (7 times slower), so I'm obviously still doing something very wrong.
是的,使用 SQL 2008 空间数据可以很好地完成此操作。在正确的网格级别设置空间索引方面需要一些专业知识/反复试验(?),但在那之后它应该很棒(由朋友转达给我,我自己没有在生产中使用它)。
出于您的目的(纬度/经度),您将需要地理图形类型,而不是非地理度量。我相信空间索引建立了“嵌套边界三角形”类型的索引,这是对“边界框”类型预处理的改进,而在没有空间索引的情况下,我们被迫在 SQL 中进行预处理。
好的,我建议从这篇文章开始 位于 Grant Fritchey 的“可怕的 DBA 之家”博客(如果您想提问,请告诉他是我派您来的:-))。这是对一些性能分析和性能的很好解释。他刚刚开始学习空间索引的调整,还包括许多其他材料的链接。
Yes, this can be done very well with SQL 2008 spatial data. There is some expertise/trial and error(?) in setting the spatial indexing up at the right mesh level, but after that it is supposed it be great (relayed to me by friends, I have not used it in production myself).
For your purposes (lat/lon) you will wan the geographic type and not the geometric. I believe that the spatial indexes set up a "nested bounding triangle" type of indexing that is an improvement over the "bounding box" type of preconditioning that we are forced to do in SQL without it.
OK, I would suggest starting at this post at Grant Fritchey's "Home of the Scary DBA" Blog (tell him I sent you if you want to ask questions :-) ). This is a good explanation of some performance analysis & tuning of spatial indexes he was just starting to learn and also includes links to lots of other material.
你知道,大约 5 年前我在星巴克进行纬度/经度查询...
基本上,我们想要将商店与配送中心关联起来。我在他们的运营部门工作,老实说他们无法分辨哪家商店是由特定仓库供应。
所以我最终想出了这个“魔圈”算法。
基本上,他们有一堆如下所示的查询:
select * from table1, table2 其中 UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250
我最终想出了一些非常相似的东西,但它运行得更快
select * from table1, table2 其中 UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250
和 (table2.Lat - 1) 和 (table2.Lat + 1) 之间的 table1.Lat
(table2.Long - 1) 和 (table2.Long + 1) 之间的 table1.Long
基本上 - 如果两个地理点之间的差异超过 1 度(纬度和经度),则不要尝试比较两个地理点。
换句话说,它使用 NORMAL RELATIONAL INDEXES 过滤掉大量值,然后 UDF 距离计算需要处理的信息要少得多。
希望有帮助,如果需要,我会尽力帮助澄清
you know, I was doing Lat / Longitude queries at Starbucks about 5 years ago...
and basically, we wanted to correlate stores to distribution centers.. I was working in their operations department, and they honestly couldn't tell which store was supplied by a particular warehouse.
So I ended up coming up with this 'magic circle' algorithm.
basically, they had a bunch of queries that looked like this:
select * from table1, table2 Where UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250
I ended up coming up with something REALLY similiar, but it ran a LOT faster
select * from table1, table2 Where UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250
and table1.Lat between (table2.Lat - 1) and (table2.Lat + 1)
and table1.Long between (table2.Long - 1) and (table2.Long + 1)
Basically- don't try comparing two geographical points if there were more than 1 degree (lat and long) difference between the two.
In other words- it used NORMAL RELATIONAL INDEXES to filter out a lot of the values, and then the UDF distance calculation had a LOT less information to process.
Hope that helps, I'll try to help clarify if I need to