SQL中按距离搜索
我正在规划一个网站(Drupal/MySQL),它必须根据距某个位置的距离(我们从大约 20,000 个位置开始)搜索相当大的数据库。到目前为止,我发现以合理方式搜索的最佳解决方案是使用 SQL 中的用户定义函数来计算坐标之间的距离,例如:(
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM items WHERE distance < {$radius}
使用 John Dyer 的 距离函数 或类似的)
但是,我还了解到 UDF 效率非常低。我的第二个想法(也是暂定计划)是在这个查询中嵌套另一个查询以缩小其范围,从而在更小的项目子集上运行 UDF,例如:
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM (
SELECT * FROM items WHERE
lat BETWEEN ${inputLat - const} AND ${inputLat + const} AND
lon BETWEEN ${inputLon - const} AND ${inputLon + const}
) WHERE distance < ${radius}
这个模型会使搜索更快,还是更复杂?还有更好的解决方案吗?
I am planning a website (Drupal/MySQL), which must search a fairly large database based on distance from a location (we're starting with ~20,000 locations). So far, the best solution I've found to searching in a reasonable manner is to use a user-defined function in SQL to calculate the distance between to coordinates, e.g.:
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM items WHERE distance < {$radius}
(Using John Dyer's distance function or similar)
However, I've also read that UDFs are very inefficient. My second idea (and tentative plan) is to nest another query inside this one to narrow its' scope and therefore run the UDF on a much smaller subset of items, e.g.:
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM (
SELECT * FROM items WHERE
lat BETWEEN ${inputLat - const} AND ${inputLat + const} AND
lon BETWEEN ${inputLon - const} AND ${inputLon + const}
) WHERE distance < ${radius}
Would this model make the search faster, or just more convoluted? Are there any better solutions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
此处使用 UDF 的开销可以忽略不计,只要您在
distance
distance
上执行扫描即可。 ${radius}
并有 2 个基于范围的比较(它们无法使用索引进行优化)。因此,不必担心 UDF“效率低下”并使用它,因为它更具可读性。
The overhead of using UDF here is negligible, as long as you perform scan over
distance < ${radius}
and have 2 range-based comparisons (they cannot be optimized with indexes).So don't worry about UDF "inefficiency" and use it, since it is much more readable.