单链接聚类

发布于 2024-12-11 10:45:52 字数 597 浏览 0 评论 0原文

我正在寻找一种使用 OpenCV 进行单链接聚类的方法。我的场景:

  • 数百(可能数千)个特征向量(向量维度最多可达 800 个特征)。
  • 簇的数量未知(可能远低于向量的数量)。
  • 固定相似性阈值 E - 如果两个向量之间的 l1 范数小于 E,则向量应位于同一簇中。
  • 我不需要一个紧凑的集群。也就是说,我不需要簇中的所有向量都在彼此的 E 范围内。这可能会导致长“链”而不是簇,但我对此表示同意。

我尝试使用 K 均值,但因为我不知道簇的数量,所以它在这里并不适用。我可以迭代 K 均值并寻找最佳 K,但这听起来效率很低。我可以在这里使用 OpenCV 中实现的更合适的聚类算法吗?

理想情况下,我需要类似于 SLINK 算法 的东西,如下这是我目前正在尝试实现的论文中引用的内容。我的选择是直接实现 SLINK(有点任务,因为需要调试和测试)或寻找具有类似功能的现有算法。

有什么建议吗?

I'm looking for a way to do single link clustering with OpenCV. My scenario:

  • Hundreds (potentially thousands) of feature vectors (vectors dimension can be up to ~800 features).
  • Unknown number of clusters (likely to be much lower than the number of vectors).
  • Fixed similarity threshold E - if the l1 norm between two vectors is less than E, then the vectors should be in the same cluster.
  • I don't need a cluster to be compact. That is, I don't need all the vectors in the cluster to be within E of each other. This can lead to long "chains" instead of clusters, but I'm OK with this.

I tried using K-means, but because I don't know the number of clusters it's not really applicable here. I could do iterative K-means and look for the best K, but it sounds inefficient. Is there a more suitable clustering algorithm implemented in OpenCV that I could use here?

Ideally, I need something similar to the SLINK algorithm, as this is what is quoted in the paper that I'm currently trying to implement. My options are to implement SLINK directly (a bit of a task, because of debugging & testing) or look for an existing algorithm that does something similar.

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

压抑⊿情绪 2024-12-18 10:45:52

我建议根据相似度阈值构建一个图表并查找连接的组件。一旦构建了图形,查找连接的组件将相当容易且高效。如果您喜欢NetworkX已经有一个连接的组件函数

I'd suggest constructing a graph by your similarity threshold and finding connected components. Once you construct the graph, finding connected components will be fairly easy and efficient. If you like NetworkX already has a connected component function.

好久不见√ 2024-12-18 10:45:52

我最终自己做了一个实现:

import cv
def main():
    import sys
    x = cv.Load(sys.argv[1])
    epsilon = float(sys.argv[2])
    y = cv.CloneImage(x)
    labels = range(x.height)
    tmp = cv.CreateImage((x.width, 1), x.depth, x.nChannels)
    for i in range(x.height):
        cv.SetImageROI(x, (0, i, x.width, 1))
        for j in range(i+1, x.height):
            cv.SetImageROI(y, (0, j, x.width, 1))
            cv.AbsDiff(x, y, tmp)
            dist, _, _, _ = cv.Avg(tmp)
            if dist < epsilon:
                for k, lbl in enumerate(labels):
                    if lbl == j:
                        labels[k] = i

    for i, lbl in enumerate(labels):
        print i, lbl

if __name__ == '__main__':
    main()

x是一个包含N向量的N x M矩阵。向量的维数是M。它基本上使用 L1 范数比较每对向量,如果它们的差异小于 epsilon,则认为一对向量是相同的。这个算法非常慢 --- O(N^3),但目前对我来说已经足够了。

I ended up doing an implementation by myself:

import cv
def main():
    import sys
    x = cv.Load(sys.argv[1])
    epsilon = float(sys.argv[2])
    y = cv.CloneImage(x)
    labels = range(x.height)
    tmp = cv.CreateImage((x.width, 1), x.depth, x.nChannels)
    for i in range(x.height):
        cv.SetImageROI(x, (0, i, x.width, 1))
        for j in range(i+1, x.height):
            cv.SetImageROI(y, (0, j, x.width, 1))
            cv.AbsDiff(x, y, tmp)
            dist, _, _, _ = cv.Avg(tmp)
            if dist < epsilon:
                for k, lbl in enumerate(labels):
                    if lbl == j:
                        labels[k] = i

    for i, lbl in enumerate(labels):
        print i, lbl

if __name__ == '__main__':
    main()

x is a N x M matrix containing N vectors. The dimensionality of a vector is M. It basically compares each pair of vectors using the L1 norm, and considers a pair to be identical if their difference is less than epsilon. This algorithm is very slow --- O(N^3), but it's good enough for me at the moment.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文