模糊 C 均值算法可用于 Python 吗?

发布于 2024-11-25 00:50:34 字数 747 浏览 1 评论 0原文

我在 3 维空间中有一些点,想将它们聚集起来。我知道Python的模块“cluster”,但它只有K-Means。您知道具有 FCM(模糊 C 均值)的模块吗?

(如果你知道其他一些与聚类相关的 python 模块,你可以将它们命名为额外的奖励。但重要的问题是 python 中的 FCM 算法。)

Matlab

它似乎相当在 Matlab 中易于使用 FCM(示例)。 Python 没有这样的功能吗?

NumPy、SciPy 和 Sage

我在 NumPySciPy圣人。我已经下载了文档并进行了搜索。没有结果

Python-cluster

看来 cluster 模块将在下一个版本中添加模糊 C-Means(请参阅路线图)。但我现在需要它

I have some dots in a 3 dimensional space and would like to cluster them. I know Pythons module "cluster", but it has only K-Means. Do you know a module which has FCM (Fuzzy C-Means)?

(If you know some other python modules which are related to clustering you could name them as a bonus. But the important question is the one for a FCM-algorithm in python.)

Matlab

It seems to be quite easy to use FCM in Matlab (example). Isn't something like this available for Python?

NumPy, SciPy and Sage

I didn't find FCM in NumPy, SciPy or Sage. I've downloaded the documentation and searched for it. No results

Python-cluster

It seems like the cluster module will add fuzzy C-Means with the next version (see Roadmap). But I need it now

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

ˉ厌 2024-12-02 00:50:34

PEACH 将提供一些模糊 C-Means 功能:
http://code.google.com/p/peach/

但是没有似乎没有任何可用的文档,因为维基是空的。 将 FCM 与 PEACH 结合使用的示例 可以在其网站上找到。

PEACH will provide some Fuzzy C-Means functionality:
http://code.google.com/p/peach/

However there doesn't seem to be any usable documentation as the wiki is empty. An example for using FCM with PEACH can be found on its website.

把人绕傻吧 2024-12-02 00:50:34

查看 scikit-fuzzy 包。它具有非常基本的模糊逻辑功能,包括模糊 C 均值聚类。

Have a look at scikit-fuzzy package. It has the very basic fuzzy logic functionality, including fuzzy c-means clustering.

飘落散花 2024-12-02 00:50:34

Python

PyPI 中有一个 fuzzy-c-means 包。查看链接: fuzzy-c-means Python

这是最简单的方法在Python中使用FCM。希望有帮助。

Python

There is a fuzzy-c-means package in the PyPI. Check out the link : fuzzy-c-means Python

This is the simplest way to use FCM in python. Hope it helps.

信仰 2024-12-02 00:50:34

我从头开始使用 K++ 初始化(使用固定种子和 5 个质心。将其调整到所需的质心数量应该不会太困难):

# K++ initialization Algorithm:
import random
def initialize(X, K):
    C = [X[0]]
    for k in range(1, K):
        D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
        probs = D2/D2.sum()
        cumprobs = probs.cumsum()
        np.random.seed(20)            # fixxing seeds
        #random.seed(0)               # fixxing seeds
        r = scipy.rand()        
        for j,p in enumerate(cumprobs):
            if r < p:
                i = j
                break
        C.append(X[i])
    return C

a = initialize(data2,5)   # "a" is the centroids initial array... I used 5 centroids

# Now the Fuzzy c means algorithm:
m = 1.5     # Fuzzy parameter (it can be tuned)
r = (2/(m-1))

# Initial centroids:
c1,c2,c3,c4,c5 = a[0],a[1],a[2],a[3],a[4]

# prepare empty lists to add the final centroids:
cc1,cc2,cc3,cc4,cc5 = [],[],[],[],[]

n_iterations = 10000

for j in range(n_iterations):
    u1,u2,u3,u4,u5 = [],[],[],[],[]

    for i in range(len(data2)):
        # Distances (of every point to each centroid):
        a = LA.norm(data2[i]-c1)    
        b = LA.norm(data2[i]-c2)
        c = LA.norm(data2[i]-c3)
        d = LA.norm(data2[i]-c4)
        e = LA.norm(data2[i]-c5)

        # Pertenence matrix vectors:
        U1 = 1/(1 + (a/b)**r + (a/c)**r + (a/d)**r + (a/e)**r) 
        U2 = 1/((b/a)**r + 1 + (b/c)**r + (b/d)**r + (b/e)**r)
        U3 = 1/((c/a)**r + (c/b)**r + 1 + (c/d)**r + (c/e)**r)
        U4 = 1/((d/a)**r + (d/b)**r + (d/c)**r + 1 + (d/e)**r)
        U5 = 1/((e/a)**r + (e/b)**r + (e/c)**r + (e/d)**r + 1)

        # We will get an array of n row points x K centroids, with their degree of pertenence       
        u1.append(U1)
        u2.append(U2)
        u3.append(U3)
        u4.append(U4)
        u5.append(U5)        

    # now we calculate new centers:
    c1 = (np.array(u1)**2).dot(data2) / np.sum(np.array(u1)**2)
    c2 = (np.array(u2)**2).dot(data2) / np.sum(np.array(u2)**2)
    c3 = (np.array(u3)**2).dot(data2) / np.sum(np.array(u3)**2)
    c4 = (np.array(u4)**2).dot(data2) / np.sum(np.array(u4)**2)
    c5 = (np.array(u5)**2).dot(data2) / np.sum(np.array(u5)**2)

    cc1.append(c1)
    cc2.append(c2)
    cc3.append(c3)
    cc4.append(c4)
    cc5.append(c5) 

    if (j>5):  
        change_rate1 = np.sum(3*cc1[j] - cc1[j-1] - cc1[j-2] - cc1[j-3])/3
        change_rate2 = np.sum(3*cc2[j] - cc2[j-1] - cc2[j-2] - cc2[j-3])/3
        change_rate3 = np.sum(3*cc3[j] - cc3[j-1] - cc3[j-2] - cc3[j-3])/3
        change_rate4 = np.sum(3*cc4[j] - cc4[j-1] - cc4[j-2] - cc4[j-3])/3
        change_rate5 = np.sum(3*cc5[j] - cc5[j-1] - cc5[j-2] - cc5[j-3])/3        
        change_rate = np.array([change_rate1,change_rate2,change_rate3,change_rate4,change_rate5])
        changed = np.sum(change_rate>0.0000001)
        if changed == 0:
            break

print(c1)  # to check a centroid coordinates   c1 - c5 ... they are the last centroids calculated, so supposedly they converged.
print(U)  # this is the degree of pertenence to each centroid (so n row points x K centroids columns).

我知道它不是很 Pythonic,但我希望它可以完整的模糊 C 均值算法的起点。我认为当数据不易分离时(例如,当“t-SNE 可视化”将所有数据显示在一起而不是显示清楚分开的组时),“软聚类”是可行的方法。在这种情况下,强制数据严格属于只有一个聚类可能是危险的)。我会尝试使用 m = 1.1 到 m = 2.0,这样您就可以看到模糊参数如何影响相关矩阵。

I have done it from scratch, using K++ initialization (with fixed seeds and 5 centroids. It should't be too difficult to addapt it to your desired number of centroids):

# K++ initialization Algorithm:
import random
def initialize(X, K):
    C = [X[0]]
    for k in range(1, K):
        D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
        probs = D2/D2.sum()
        cumprobs = probs.cumsum()
        np.random.seed(20)            # fixxing seeds
        #random.seed(0)               # fixxing seeds
        r = scipy.rand()        
        for j,p in enumerate(cumprobs):
            if r < p:
                i = j
                break
        C.append(X[i])
    return C

a = initialize(data2,5)   # "a" is the centroids initial array... I used 5 centroids

# Now the Fuzzy c means algorithm:
m = 1.5     # Fuzzy parameter (it can be tuned)
r = (2/(m-1))

# Initial centroids:
c1,c2,c3,c4,c5 = a[0],a[1],a[2],a[3],a[4]

# prepare empty lists to add the final centroids:
cc1,cc2,cc3,cc4,cc5 = [],[],[],[],[]

n_iterations = 10000

for j in range(n_iterations):
    u1,u2,u3,u4,u5 = [],[],[],[],[]

    for i in range(len(data2)):
        # Distances (of every point to each centroid):
        a = LA.norm(data2[i]-c1)    
        b = LA.norm(data2[i]-c2)
        c = LA.norm(data2[i]-c3)
        d = LA.norm(data2[i]-c4)
        e = LA.norm(data2[i]-c5)

        # Pertenence matrix vectors:
        U1 = 1/(1 + (a/b)**r + (a/c)**r + (a/d)**r + (a/e)**r) 
        U2 = 1/((b/a)**r + 1 + (b/c)**r + (b/d)**r + (b/e)**r)
        U3 = 1/((c/a)**r + (c/b)**r + 1 + (c/d)**r + (c/e)**r)
        U4 = 1/((d/a)**r + (d/b)**r + (d/c)**r + 1 + (d/e)**r)
        U5 = 1/((e/a)**r + (e/b)**r + (e/c)**r + (e/d)**r + 1)

        # We will get an array of n row points x K centroids, with their degree of pertenence       
        u1.append(U1)
        u2.append(U2)
        u3.append(U3)
        u4.append(U4)
        u5.append(U5)        

    # now we calculate new centers:
    c1 = (np.array(u1)**2).dot(data2) / np.sum(np.array(u1)**2)
    c2 = (np.array(u2)**2).dot(data2) / np.sum(np.array(u2)**2)
    c3 = (np.array(u3)**2).dot(data2) / np.sum(np.array(u3)**2)
    c4 = (np.array(u4)**2).dot(data2) / np.sum(np.array(u4)**2)
    c5 = (np.array(u5)**2).dot(data2) / np.sum(np.array(u5)**2)

    cc1.append(c1)
    cc2.append(c2)
    cc3.append(c3)
    cc4.append(c4)
    cc5.append(c5) 

    if (j>5):  
        change_rate1 = np.sum(3*cc1[j] - cc1[j-1] - cc1[j-2] - cc1[j-3])/3
        change_rate2 = np.sum(3*cc2[j] - cc2[j-1] - cc2[j-2] - cc2[j-3])/3
        change_rate3 = np.sum(3*cc3[j] - cc3[j-1] - cc3[j-2] - cc3[j-3])/3
        change_rate4 = np.sum(3*cc4[j] - cc4[j-1] - cc4[j-2] - cc4[j-3])/3
        change_rate5 = np.sum(3*cc5[j] - cc5[j-1] - cc5[j-2] - cc5[j-3])/3        
        change_rate = np.array([change_rate1,change_rate2,change_rate3,change_rate4,change_rate5])
        changed = np.sum(change_rate>0.0000001)
        if changed == 0:
            break

print(c1)  # to check a centroid coordinates   c1 - c5 ... they are the last centroids calculated, so supposedly they converged.
print(U)  # this is the degree of pertenence to each centroid (so n row points x K centroids columns).

I know it is not very pythonic, but I hope it can be a starting point for your complete fuzzy C means algorithm. I think that "soft clustering" is the way to go when data is not easily separable (for example, when "t-SNE visualization" show all data together instead of showing groups clearly separated. In this case, forcing data to pertain strictly to only one clustering can be dangerous). I would give a try with m = 1.1, to m = 2.0, so you can see how the fuzzy parameter affects to the pertenence matrix.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文