谷歌地图 API v3 的服务器端集群

发布于 2024-12-06 07:09:45 字数 388 浏览 0 评论 0原文

我目前正在开发一种谷歌地图概述小部件，它将位置显示为地图上的标记。标记的数量从数百个到数千个（10000 个以上）不等。现在，我正在使用 MarkerClusterer 进行谷歌地图 v3 1.0 和 谷歌地图 javascript api v3 (premier)，它对于一百个标记来说效果相当不错。由于标记的数量会增加，我需要一种新的标记聚类方法。根据我的阅读，保持性能的唯一方法是将集群从客户端移动到服务器端。有谁知道一个好的 PHP5 库可以帮我完成这个任务吗？

Atm 我正在深入研究谷歌地图的图层机制。也许还有一些我可以开始查看的领先 PHP 库？我也遇到过 FusionTables，但由于我需要集群，我认为这可能不是正确的解决方案。

提前致谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

与往事干杯 2024-12-13 07:09:45

我不知道有哪个服务器端库可以为您完成这项工作。不过，我可以为您提供一些有关如何自行实施的指导。

聚类的基本方法就是计算标记之间的距离，当其中两个足够接近时，您可以将它们替换为位于两个标记之间中点的单个标记。

您不仅可以限制彼此标记的距离，还可以（或相反）选择限制您想要的簇/标记的数量。

为了实现这一点，您可以计算所有标记对之间的距离，对它们进行排序，然后从顶部合并，直到您想要的标记/簇数量为止。

要在形成簇时细化中点定位，您可以考虑要合并的两个中的每一个所代表的实际标记的数量。将该数字视为重量，将两个标记之间的线视为秤。然后，不要总是选择中点，而是选择可以平衡比例的点。

我猜想，如果标记数量有限，这种简单的聚类形式就足够了。如果您的数据集（标记数量及其位置）大致是静态的，您可以偶尔在服务器上计算聚类，对其进行缓存，然后直接从缓存中为客户端提供服务。

但是，如果您需要支持可能具有世界各地标记的大规模场景，您将需要更复杂的方法。

所提到的集群算法不具有可扩展性。事实上，其计算成本通常会随着标记数量呈指数增长。

为了解决这个问题，您可以将世界分成多个分区并计算集群并为每个分区的客户端提供服务。这确实支持扩展，因为工作负载可以由多个（大致）独立的服务器分割和执行。

接下来的问题是如何找到一个好的分区方案。您可能还需要考虑在不同的缩放级别提供不同的标记聚类，并且您的分区方案也应该包含这一点以允许缩放。

Google 将地图划分为具有 x、y 和 z 坐标的图块，其中 x 和 y 是从地图的西北角开始的图块的水平和垂直位置地图，其中 z 是缩放级别。

在最小缩放级别（零）时，整个地图由单个图块组成。（所有图块均为 256x256 像素）。在下一个缩放级别，该图块被分为四个子图块。这样继续下去，因此在缩放级别 2 中，这四个图块中的每个图块都被分为四个子图块，这总共为我们提供了 16 个图块。缩放级别 3 有 64 个图块，级别 4 有 256 个图块，依此类推。（任何缩放级别上的图块数量都可以表示为 4^z。）

使用此分区方案，您可以从最低缩放级别（最高 z 坐标）开始计算每个图块的聚类，然后向上冒泡直到你到达山顶。

对于单个图块要聚类的标记集是其四个子图块的所有标记（其中一些可以代表聚类）的并集。

这为您提供了有限的计算成本，并且还为您提供了一种将数据分块发送到客户端的好方法。客户端可以在将标记加载到地图中时逐个图块地请求标记，而不是请求给定缩放级别（这不会缩放）的所有标记。

然而，这种方法有一个缺陷：考虑两个相邻的图块，一个在左边，一个在右边。如果左侧图块在其最右侧包含一个标记/簇，而右侧图块在其最左侧包含一个标记/簇，则这两个标记/簇应该合并，但不会合并，因为我们正在执行聚类每个图块单独的机制。

为了解决这个问题，您可以在对图块进行聚类后对其进行后处理，以便合并位于四个边缘中的每一个上的标记/簇，同时考虑给定图块的八个相邻图块中的每一个。仅当我们可以假设没有单个簇足够大来影响不在同一子图块中的周围标记时，这种合并后机制才会起作用。然而，这是一个合理的假设。

最后一点：通过横向扩展的方法，您将让客户提出几个小请求。这些请求将具有局部性（即，瓦片不是随机请求的，而是地理上彼此接近的瓦片通常也被一起访问）。

为了提高查找/查询性能，您将受益于使用也具有此局部性属性的搜索键（代表切片）（因为这会将相邻切片的数据存储在磁盘上的相邻数据块中 - 提高读取时间和缓存利用率）。

您可以使用平铺/子平铺分区方案来形成这样的密钥。让顶部图块（跨越整个地图的单个图块）将空字符串作为键。接下来，让每个子图块都有键 A、B、C 和 D。下一个级别将有键 AA、AB、AC、AD、BA、BC、...、DC、DD。

递归地应用此方法，您最终将得到一个分区键，该分区键可识别您的图块，允许快速转换为 x、y、z 坐标并具有位置属性。此密钥命名方案有时称为“四元密钥”，因为分区方案形成“四元树”。局部性属性与使用 Z 顺序曲线将 2D 值映射到 1D 值时获得的属性相同。

如果您需要更多详细信息，请告诉我。

I don't know of a server-side library that'll do the job for you. I can however give you some pointers on how to implement one yourself.

The basic approach to clustering is simply to calculate the distance between your markers and when two of them are close enough you replace them with a single marker located at the mid-point between the two.

Instead of just having a limitation on how close to each other markers may be, you may also (or instead) choose to limit the number of clusters/markers you want as a result.

To accomplish this you could calculate the distance between all pairs of markers, sort them, and then merge from the top until you only have as many markers/clusters as you wish.

To refine the mid-point positioning when forming a cluster you may take into account the number of actual markers represented by each of the two to be merged. Think of that number as a weight and the line between the two markers as a scale. Then instead of always choosing the mid-point, choose the point that would balance the scale.

I'd guess that this simple form of clustering is good enough if you have a limited number of markers. If your data set (# of markers and their position) is roughly static you can calculate clustering on the server once in a while, cache it, and server clients directly from the cache.

However, if you need to support large scale scenarios potentially with markers all over the world you'll need a more sophisticated approach.

The mentioned cluster algorithm does not scale. In fact its computation cost would typically grow exponentially with the number of markers.

To remedy this you could split the world into partitions and calculate clustering and serve clients from each partition. This would indeed support scaling since the workload can be split and performed by several (roughly) independent servers.

The question then is how to find a good partitioning scheme. You may also want to consider providing different clustering of markers at different zoom levels, and your partitioning scheme should incorporate this as well to allow scaling.

Google divide the map into tiles with x, y and z-coordinates, where x and y are the horizontal and vertical position of the tile starting from the north-west corner of the map, and where z is the zoom level.

At the minimum zoom level (zero) the entire map consist of a single tile. (all tiles are 256x256 pixels). At the next zoom level that tile is divided into four sub tiles. This continues, so that in zoom level 2 each of those four tiles has been divided into four sub tiles, which gives us a total of 16 tiles. Zoom level 3 has 64 tiles, level 4 has 256 tiles, and so on. (The number of tiles on any zoom level can be expressed as 4^z.)

Using this partitioning scheme you could calculate clustering per tile starting at the lowest zoom level (highest z-coordinate), bubbling up until you reach the top.

The set of markers to be clustered for a single tile is the union of all markers (some of which may represent clusters) of its four sub tiles.

This gives you a limited computational cost and also gives you a nice way of chunking up the data to be sent to the client. Instead of requesting all markers for a given zoom level (which would not scale) clients can request markers on a tile-by-tile basis as they are loaded into the map.

There is however a flaw in this approach: Consider two adjacent tiles, one to the left and one to the right. If the left tile contains a marker/cluster at its far right side and the right tile contains a marker/cluster at its far left side, then those two markers/clusters should be merged but won't be since we're performing the clustering mechanism for each tile individually.

To remedy this you could post-process tiles after they have been clustered so that you merge markers/clusters that lay on the each of the four edges, taking into account each of the eight adjacent tiles for a given tile. This post-merging mechanism will only work if we can assume that no single cluster is large enough to affect the surrounding markers which are not in the same sub tile. This is, however, a reasonable assumption.

As a final note: With the scaled out approach you'll have clients making several small requests. These requests will have locality (i.e. tiles are not randomly requested, but instead tiles that are geographically close to each other are also typically accessed together).

To improve lookup/query performance you would benefit from using search keys (representing the tiles) that also have this locality property (since this would store data for adjacent tiles in adjacent data blocks on disk - improving read time and cache utilization).

You can form such a key using the tile/sub tile partitioning scheme. Let the top tile (the single one spanning the entire map) have the empty string as key. Next, let each of its sub tiles have the keys A, B, C and D. The next level would have keys AA, AB, AC, AD, BA, BC, ..., DC, DD.

Apply this recursively and you'll end up with a partitioning key that identifies your tiles, allows quick transformation to x,y,z-coordinates and has the locality property. This key naming scheme is sometimes called a Quad Key stemming from the fact that the partitioning scheme forms a Quad Tree. The locality property is the same as you get when using a Z-order curve to map a 2D-value into a 1D-value.

Please let me know if you need more details.

回复收藏 0 原文