聚类问题到图论语言的翻译

发布于 2024-08-29 02:32:50 字数 710 浏览 2 评论 0原文

我有一个矩形平面网格，每个单元格分配了一些整数权重。我正在寻找一种算法来识别具有高于平均权重的 3 到 6 个相邻细胞的簇。这些斑点应具有近似圆形的形状。

对于我的情况，不包含簇的单元格的平均权重约为 6，而包含簇的单元格的平均权重约为 6+4，即在 6 左右存在“背景权重”。权重随泊松统计量波动。

对于小背景，贪婪或种子算法表现得相当好，但如果我的簇单元的权重接近背景的波动，即它们倾向于找到一个簇，即使什么也没有，这种情况就会崩溃。另外，我无法对所有可能的设置进行循环搜索，因为我的网格很大（大约 1000x1000），而且我计划经常这样做（10^9 次）。我的印象是，在图论中可能存在解决这个问题的方法。我听说过顶点覆盖和派系，但我不确定如何最好地将我的问题翻译成他们的语言。我知道图论可能存在输入的统计性质问题，但我有兴趣看看那里的算法可以找到什么，即使它们无法识别每个簇。

这里是一个剪辑示例：框架区域每个单元格平均有 10 个条目，所有其他单元格平均有 6 个条目。当然，网格会进一步延伸。

| 8|  8|  2|  8|  2|  3| 
| 6|  4|  3|  6|  4|  4| 
        ===========
| 8|  3||13|  7| 11|| 7|
|10|  4||10| 12|  3|| 2|
| 5|  6||11|  6|  8||12|
        ===========
| 9|  4|  0|  2|  8|  7|

原文

I have a rectangular planar grid, with each cell assigned some integer weight. I am looking for an algorithm to identify clusters of 3 to 6 adjacent cells with higher-than-average weight. These blobs should have approximately circular shape.

For my case the average weight of the cells not containing a cluster is around 6, and that for cells containing a cluster is around 6+4, i.e. there is a "background weight" somewhere around 6. The weights fluctuate with a Poisson statistic.

For small background greedy or seeded algorithms perform pretty well, but this breaks down if my cluster cells have weights close to fluctuations in the background i.e. they will tend to find a cluster even though there is nothing. Also, I cannot do a brute-force search looping through all possible setups because my grid is large (something like 1000x1000) and I plan to do this very often (10^9 times). I have the impression there might exist ways to tackle this in graph theory. I heard of vertex-covers and cliques, but am not sure how to best translate my problem into their language. I know that graph theory might have issues with the statistical nature of the input, but I would be interest to see what algorithms from there could find, even if they cannot identify every cluster.

Here an example clipping: the framed region has on average 10 entries per cell, all other cells have on average 6. Of course the grid extends further.

| 8|  8|  2|  8|  2|  3| 
| 6|  4|  3|  6|  4|  4| 
        ===========
| 8|  3||13|  7| 11|| 7|
|10|  4||10| 12|  3|| 2|
| 5|  6||11|  6|  8||12|
        ===========
| 9|  4|  0|  2|  8|  7|

分享到QQ

分享到微博