优化地图搜索
对于我们的一个客户,我们提供了一个系统,用于从用户的邮政编码位置检索最近的 N 个地标。 我们拥有一个包含所有可用邮政编码(650,000 多个)以及相应坐标(纬度和经度)以及该国所有 400 多个地标的数据库。
现在,我们使用以下过程查找最近的 N 个地标
- 检索所选邮政编码的纬度和经度
- 获取所有地标的坐标
- 使用地理距离公式对它们进行排序
- 获取最近的 N+2 个地标并获取到的真实距离他们使用以下过程
- 检查坐标之间的距离是否存储在距离缓存表中
- 如果没有,它会转到地图引擎,检索距离并将其存储在缓存中
- 重新排序列表并返回前 N 个最近的地标
问题是我们需要从数据库访问和第三方访问的角度来优化这一点。
我们尝试缓存所有邮政编码到最近的 M 个地标的距离,但该表将获得额外的 6Gb 数据,并且需要大约 250 天才能填充,因为请求大约需要 30 秒。
我们正在考虑对数据进行分区并将接近的邮政编码分组在一起,但这将使准确的距离无效。
在这种情况下您会看到哪些优化解决方案。 谢谢。
for one of our clients we are providing a system for retrieving the closest N landmarks from the users zipcode location.
We have a database of all the available zipcodes (650,000+) with the coresponding coordinates (latitude and longitude) and also all of 400+ landmarks in the country.
For now we are using the following process from finding closest N landmarks
- Retrieve the lat and lng of the selected zipcode
- Get the coordinates of all the landmarks
- Order them by using a geographic distance formula
- Take the closest N+2 landmarks and get the real distance to them using the following process
- check if the distance between coordinates is stored in the distance cache table
- if not it goes to a map engine, retrieved the distance and stores it in the cache
- Reorder the list and return first N closest landmarks
The problem is we need to optimize this both from database access point of view and 3rd party access also.
We have tried to cache for all zipcodes the distance to closest M landmarks but the table would gain an additional 6Gb of data and it would take around 250 days to fill since a request takes aprox 30 sec.
We were thinking on partitioning the data and grouping close postcodes together but that will void the exact distance.
What optimising solutions you see in this situation.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以尝试重复的方法。
You could try an repetitive approach.
这应该在数据库级别完成。您应该使用具有地理扩展的数据库,例如 SQL Server 2008 R2,或者具有 PostGIS 扩展的优秀开源选择 PostGre SQL。通过这些,您可以存储地理 BLOB 而不是坐标,并且有许多内置函数可以计算地理,可以为您处理步骤 2 到 5。
我建议你从这里开始:
http://postgis.refractions.net/
问候
This should be done on database- level. You should use a database with an geographic extension as SQL Server 2008 R2, or the excellent open source choise PostGre SQL with PostGIS extension. With those you store Geographical BLOBs instead of coordinates, and there are many built in functions to calculate geography that will take care of step 2 to 5 for you.
I suggest you start here:
http://postgis.refractions.net/
Regards