两个坐标之间的距离,如何简化和/或使用不同的技术?
我需要编写一个查询,该查询允许我从提供的位置查找范围(英里)内的所有位置。
表格是这样的:
id | name | lat | lng
所以我一直在做研究并发现:这是我的sql演示文稿
我已经在大约 100 行的表上对其进行了测试,并且还会有更多! - 必须是可扩展的。
我首先尝试了像这样更简单的东西:
//just some test data this would be required by user input
set @orig_lat=55.857807; set @orig_lng=-4.242511; set @dist=10;
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2)
+ COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest, locations orig
WHERE orig.id = '1'
HAVING distance < 1
ORDER BY distance;
这在 50ms 左右返回了行,这非常好! 然而,随着行数的增加,速度会急剧减慢。
EXPLAIN
显示它仅使用 PRIMARY 键,这是显而易见的。
然后阅读上面链接的文章后。我尝试了这样的事情:
// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set @mylon = -4.242511;
set @mylat = 55.857807;
set @dist = 0.5;
-- calculate lon and lat for the rectangle:
set @lon1 = @mylon-@dist/abs(cos(radians(@mylat))*69);
set @lon2 = @mylon+@dist/abs(cos(radians(@mylat))*69);
set @lat1 = @mylat-(@dist/69);
set @lat2 = @mylat+(@dist/69);
-- run the query:
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((@mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
+ COS(@mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((@mylon - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest
WHERE dest.lng BETWEEN @lon1 AND @lon2
AND dest.lat BETWEEN @lat1 AND @lat2
HAVING distance < @dist
ORDER BY distance;
这个查询的时间大约是240ms,这还不错,但是比上次慢。但我可以想象,如果行数更多,这会更快。但是,EXPLAIN
将可能的键显示为 lat
、lng
或 PRIMARY
并使用了 PRIMARY
>。
我怎样才能做得更好???
我知道我可以将 lat lng 存储为 POINT();但我还没有找到太多这方面的文档来表明它是否更快或更准确?
任何其他想法都会很乐意接受!
非常感谢!
-Stefan
更新:
正如 Jonathan Leffler 指出的那样,我犯了一些我没有注意到的错误:
我只在其中一个 lat 值上放置了 abs() 。当不需要时,我也在第二个的 WHERE 子句中使用了 id 搜索。第一个查询纯粹是实验性的,第二个查询更有可能投入生产。
经过这些更改后,EXPLAIN
显示关键现在使用 lng
列,平均响应时间约为 180ms,这是一个改进。
I need to write a query which allows me to find all locations within a range (Miles) from a provided location.
The table is like this:
id | name | lat | lng
So I have been doing research and found: this my sql presentation
I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.
I tried something more simple like this first:
//just some test data this would be required by user input
set @orig_lat=55.857807; set @orig_lng=-4.242511; set @dist=10;
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2)
+ COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest, locations orig
WHERE orig.id = '1'
HAVING distance < 1
ORDER BY distance;
This returned rows in around 50ms which is pretty good!
However this would slow down dramatically as the rows increase.
EXPLAIN
shows it's only using the PRIMARY key which is obvious.
Then after reading the article linked above. I tried something like this:
// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set @mylon = -4.242511;
set @mylat = 55.857807;
set @dist = 0.5;
-- calculate lon and lat for the rectangle:
set @lon1 = @mylon-@dist/abs(cos(radians(@mylat))*69);
set @lon2 = @mylon+@dist/abs(cos(radians(@mylat))*69);
set @lat1 = @mylat-(@dist/69);
set @lat2 = @mylat+(@dist/69);
-- run the query:
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((@mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
+ COS(@mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((@mylon - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest
WHERE dest.lng BETWEEN @lon1 AND @lon2
AND dest.lat BETWEEN @lat1 AND @lat2
HAVING distance < @dist
ORDER BY distance;
The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN
shows the possible keys as lat
,lng
or PRIMARY
and used PRIMARY
.
How can I do this better???
I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?
Any other ideas would be happily accepted!
Thanks very much!
-Stefan
UPDATE:
As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:
I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.
After these changes EXPLAIN
shows the key is now using lng
column and average time to respond around 180ms now which is an improvement.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
任何其他想法都会被愉快地接受!
如果您想要速度(和简单性),您将需要数据库提供一些不错的地理空间支持。这介绍了地理空间数据类型、地理空间索引和(很多)用于处理/构建/分析地理空间数据的函数。
MySQL 实现了 OpenGIS 规范的一部分,尽管它是/(我上次检查时)边缘非常非常粗糙/过早(对任何实际工作都没有用)。
PostGis 位于 PostgreSql 将使这变得简单易读:(
这会找到 tableb 中距离 tablea 中 id 123 中的点 a 距离小于 1000 米的所有点)
Any other ideas would be happily accepted!
If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.
MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).
PostGis on PostgreSql would make this trivially easy and readable:
(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)
第一个查询忽略您设置的参数 - 使用 1 代替 @dist 表示距离,并使用表别名
orig
代替参数@orig_lat
和@ orig_lon
。然后,您可以让查询在表和自身之间执行笛卡尔积,如果可以避免的话,这很少是一个好主意。由于过滤条件
orig.id = 1
,您可以逃脱惩罚,这意味着orig
中只有一行与dest 中的每一行相连接
(包括带有dest.id = 1
的点;您可能应该有一个条件AND orig.id != dest.id
)。您还有一个 HAVING 子句,但没有 GROUP BY 子句,这表明存在问题。 HAVING 子句不涉及任何聚合,但 HAVING 子句(主要)用于比较聚合值。除非我记不清了,COS(ABS(x)) === COS(x),所以你可以通过删除 ABS() 来简化事情。如果做不到这一点,就不清楚为什么一个纬度需要 ABS 而另一个纬度不需要 ABS - 对称性在球面三角学中至关重要。
你有一些神奇的数字 - 值 69 大概是以度为单位的英里数(赤道经度),而 3956 是地球的半径。
如果给定位置靠近杆子,我对计算出的框表示怀疑。在极端情况下,您可能需要允许任何经度。
第二个查询中的条件
dest.id = 1
为奇数;我认为应该省略它,但它的存在应该会加快速度,因为只有一行符合该条件。所以所花费的额外时间令人费解。但正如所写,使用主键索引是合适的。您应该将 HAVING 子句中的条件移至 WHERE 子句中。
但我不确定这是否真的有帮助......
The first query ignores the parameters you set - using 1 instead of @dist for the distance, and using the table alias
orig
instead of the parameters@orig_lat
and@orig_lon
.You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition
orig.id = 1
, which means that there's only one row fromorig
joined with each of the rows indest
(including the point withdest.id = 1
; you should probably have a conditionAND orig.id != dest.id
). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.
You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.
I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.
The condition
dest.id = 1
in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.You should move the condition in the HAVING clause into the WHERE clause.
But I'm not sure this is really helping...
NGS在线反测地线计算器是计算地球椭球体上任意两个位置之间距离的传统参考手段:
http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl
但是上面的计算器仍然有问题。特别是在两个接近对映点的位置之间,计算出的距离可能会出现几十公里的误差! Thaddeus Vincenty 很久以前就确定了数字问题的根源(第 92 页):
http: //www.ngs.noaa.gov/PUBS_LIB/inverse.pdf
无论如何,最好使用 Charles Karney 提供的可靠且非常准确的在线计算器:
http://geographiclib.sourceforge.net/cgi-bin/Geod
The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:
http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl
But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):
http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf
In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:
http://geographiclib.sourceforge.net/cgi-bin/Geod
关于提高性能的一些想法。从可维护性的角度来看,它不会简化事情(使事情变得更加复杂),但它可以帮助提高可扩展性。
由于您知道半径,因此可以为边界框添加条件,这可能允许数据库优化查询以消除一些行,而无需进行三角计算。
您可以预先计算存储位置经纬度的一些三角值并将其存储在表中。这会在插入记录时转移一些性能成本,但如果查询数量超过插入数量,这会很好。有关此方法的想法,请参阅此答案:
查询基于Radius获取记录在 SQLite 中?
您可以查看类似 geohashing 的内容.
您可以搜索 SO 来获取有关如何实现的一些想法:
https://stackoverflow.com/search?q=geohash
Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.
Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.
You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:
Query to get records based on Radius in SQLite?
You could look at something like geohashing.
You could search SO for some ideas on how to implement:
https://stackoverflow.com/search?q=geohash
如果您只对相当小的距离感兴趣,您可以用矩形网格来近似地理网格。
您可以通过在数据库中存储弧度而不是度数(或除了度数之外)来提高效率。如果您的查询可能跨越 180° 子午线,则需要额外小心,但许多应用程序不必处理这些位置。您还可以尝试将
POWER(
x)
更改为 x*
x,这可能会计算得更快。If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.
You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change
POWER(
x)
to x*
x, which might get computed faster.