我目前有一个 Postgres DB,其中包含大约。全球 300.000 个移动车辆的数据集。我经常重复的查询是:给我 5/10/20 英里半径内的所有车辆。目前,我在数据库中花费了大约 600 到 1200 毫秒来准备一组已定位的车辆对象。
如果可能的话,我希望这次能够大幅提高,理想情况下可以提高一两个数量级。如果相关的话,我正在 Ruby on Rails 3.0beta 环境中工作。
对于如何构建整个系统来加速此查询有什么想法吗?有任何 NoSQL 数据库能够提供这种地理定位性能吗?我知道 MongoDB 正在开发一个扩展来促进这种情况,但还没有尝试过。是否可以巧妙地使用 Redis 来实现此目的?
这里 SQL-DB 的一个问题似乎是我不可能使用索引,因为我的车辆大多在移动,这意味着我必须不断创建数据库索引,这本身可能比不使用索引进行搜索更昂贵。
期待您的想法,谢谢!
I currently have a Postgres DB filled with approx. 300.000 data-sets of moving vehicles all over the world. My very frequently repeated query is: Give me all vehicles in a 5/10/20mile radius. Currently I spend around 600 to 1200 ms in the DB to prepare the set of located vehicle-objects.
I am looking to vastly improve this time by ideally one or two orders of magnitude if possible. I am working in a Ruby on Rails 3.0beta environment if this is relevant.
Any ideas how to architect the whole system to accelerate this query? Any NoSQL database able to deliver this kind of geolocation performance? I know of MongoDB working on an extension to facilitate this scenario but haven't tried it yet. Any intelligent use of Redis to achieve this?
One problem with SQL-DBs here seems to be that I can't possibly use indexes because my vehicles are mostly moving around, meaning I had to constantly created DB indexes which, by itself, is probably more expensive than just doing the searching without index.
Looking forward to your thoughs, Thanks!
发布评论
评论(2)
如果您使用正确的算法来组织数据,您将能够使用空间索引这可以显着加快您的查询速度。
地理定位域的最佳实践是使用 geohash,四叉树, R-tree 或类似的数据结构(R-trees 是最通用的,但听起来你正在查询点数据,所以这可能并不重要)。在每种情况下,您都可以创建一个使用单个线性列的空间索引,其中每个值代表不同大小和形状的边界框。这应该可以让您使用数据库中的单个范围查询来回答大多数查询。空间索引可以在 SQL 中实现(PostGIS、MS SQL,MySQL 都具有使用其中一种技术的空间数据类型和空间索引)或 NoSQL(因其水平可扩展性而流行;AppEngine 具有 geomodel,SimpleGeo 使用 Cassandra,Foursquare 使用 MongoDB)。
使用索引可能会因不断移动点而变得复杂,但我怀疑写入,甚至更新索引的稍微重的写入,都不会成为瓶颈。
If you use the right algorithm for organizing your data, you will be able to use a spatial index which can dramatically speed up your queries.
The best practice for the geolocation domain is to use a geohash, quad-tree, R-tree or similar data structure (R-trees are the most generic, but it sounds like you're querying point data, so that may not matter). In each case, you can create a spatial index that uses a single, linear column where each value represents a bounding box of varying size and shape. This should let you answer most queries with a single range query in your database. Spatial indices can be implemented in SQL (PostGIS, MS SQL, MySQL all have spatial datatypes and spatial indices which use one of these techniques) or NoSQL (popular for its horizontal scalability; AppEngine has geomodel, SimpleGeo uses Cassandra, Foursquare uses MongoDB).
Using an index can be complicated by constantly moving points, but I would suspect that writes, even slightly heavier writes that update indices, wouldn't be your bottleneck.
尽管您的车辆一直在移动,但我认为它们有某种速度限制。您可以做的是创建某种离散坐标系,一个例子是纬度/经度坐标的整数部分。然后,将这些值放入单独的列中,并将确切位置保留在另一列中。然后,您应该能够对整数列进行索引,因为车辆不会移动太多,以至于它们会经常更改这些值。
进行搜索时,您首先找出感兴趣的“方块”,然后使用索引列将查询限制为这些方块内的车辆。然后你必须对每个方格内的所有车辆进行全面搜索。您必须进行全面搜索的车辆数量现在应该只占所有车辆的一小部分。当然,这种策略的效率取决于你的车辆的分布。如果 50% 的车辆位于某个城市的某个地方,这将不起作用,但假设一个地方最大的车辆群是 5-10%,则应该可以提高性能。
Even though your vehicles are moving around all the time, I assume they have some kind of speed limit. What you can do is to create some kind of discrete coordinate system, one example would be the integer part of the lat/long coordinate. Then you put those values in separate columns, keeping the exact location in another column. You should then be able to index the integer columns, as the vehicles won't move so much that they change those values very often.
When doing a search, you first find out what "squares" are interesting, and restrict your query to the vechicles within those sqeares, using the indexed columns. Then you have to do a full search of all vehicles within each square. The number of vehicles you have to do a full search over should now only be a small fraction of all vechiles. The efficiency of this strategy of course depends on the distribution of your vechiles. If 50% of them are in a certain city somewhere this will not work, but assuming the largest group of vehicles in one place is 5-10% it should improve performance.