热图的地理数据聚类
我有一个推文列表及其地理位置。 它们将显示在透明放置在 Google 地图上的热图图像中。 诀窍是找到彼此相邻的位置组并显示 它们是基于簇大小的特定热度/颜色的单个热图圆圈/图形。
是否有一些库准备将地图中的位置分组为集群? 或者我最好应该决定我的聚类参数并构建自定义算法?
I have a list of tweets with their geo locations.
They are going to be displayed in a heatmap image transparently placed over Google Map.
The trick is to find groups of locations residing next to each other and display
them as a single heatmap circle/figure of a certain heat/color, based on cluster size.
Is there some library ready to grouping locations in a map into clusters?
Or I better should decide my clusterization params and build a custom algorithm?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我不知道是否有一个“准备将地图中的位置分组为簇的库”,也许是,也许不是。无论如何,我不建议您构建自定义聚类算法,因为已经为此实现了很多库。
@recursive 向您发送了一个链接,其中包含 k-means(一种聚类算法)的 php 代码。还有一个包含其他技术的巨大 Java 库(Java-ML),包括 k-means、分层聚类、k-means++(选择质心)等。
最后我想告诉您,聚类是一种无监督算法,这意味着它实际上会为您提供一组内部包含数据的聚类,但是第一眼你不知道算法如何对您的数据进行聚类。我的意思是,它可能按照您想要的位置进行聚类,但它也可以按照您不需要的另一个特征进行聚类,因此这一切都是关于使用算法的参数并调整您的解决方案。
我对你能找到的这个问题的最终解决方案感兴趣:)也许你可以在结束这个项目时在评论中分享它!
I don't know if there is a 'library ready to grouping locations in a map into clusters', maybe it is, maybe it isn't. Anyways, I don't recommend you to build your custom clustering algorithm since there are a lot of libraries already implemented for this.
@recursive sent you a link with a php code for k-means (one clustering algorithm). There is also a huge Java library with other techniques (Java-ML) including k-means too, hierarchical clustering, k-means++ (to select the centroids), etc.
Finally I'd like to tell you that clustering is a non-supervised algorithm, which means that effectively, it will give you a set of clusters with data inside them, but at a first glance you don't know how the algorithm clustered your data. I mean, it may be clustered by locations as you want, but it can be clustered also by another characteristic you don't need so it's all about playing with the parameters of the algorithm and tune your solutions.
I'm interested in the final solution you could find to this problem :) Maybe you can share it in a comment when you end this project!
K 均值聚类是一种经常用于解决此类问题的技术
基本思想是这样的:
这里是 php 的一些示例代码。
K means clustering is a technique often used for such problems
The basic idea is this:
Here is some sample code for php.
heatmap.js 是一个用于渲染热图的 HTML5 库,并且有一个示例在 Google 地图 API 之上。它非常强大,但仅适用于支持画布的浏览器:
heatmap.js is an HTML5 library for rendering heatmaps, and has a sample for doing it on top of the Google Maps API. It's pretty robust, but only works in browsers that support canvas:
您可以在 phpclasses.org 上尝试我的 php 类希尔伯特曲线。这是一条巨大的曲线,将 2d 复杂性降低到 1d 复杂性。我使用四键来定位坐标,它有 21 个缩放级别,就像 Google 地图一样。
You can try my php class hilbert curve at phpclasses.org. It's a monster curve and reduces 2d complexity to 1d complexity. I use a quadkey to address a coordinate and it has 21 zoom levels like Google maps.
这实际上并不是一个聚类问题。头部图不能通过创建簇来工作。相反,他们用高斯核对数据进行卷积。如果您不熟悉图像处理,请将其视为使用普通或高斯“标记”并将其标记在每个点上。由于图章的覆盖层将相互叠加,因此高密度区域将具有更高的值。
This isn't really a clustering problem. Head maps don't work by creating clusters. Instead they convolute the data with a gaussian kernel. If you're not familiar with image processing, think of it as using a normal or gaussian "stamp" and stamping it over each point. Since the overlays of the stamp will add up on top of each other, areas of high density will have higher values.
热图的一种简单替代方法是将纬度/经度四舍五入到一些小数并按其分组。
请参阅此说明了解纬度/经度小数精度。
等等。
对于包含大量数据的低缩放级别热图,四舍五入到 1 或 2 位小数并按此对结果进行分组应该可以解决问题。
One simple alternative for heatmaps is to just round the lat/long to some decimals and group by that.
See this explanation about lat/long decimal accuracy.
etc.
For a low zoom level heatmap with lots of data, rounding to 1 or 2 decimals and grouping the results by that should do the trick.