空间点离群点聚类方法

发布于 2025-01-09 05:42:22 字数 391 浏览 3 评论 0原文

我想实现无监督聚类来检测空间点的网格(垂直/水平线)。

我尝试过 DBSCAN,但结果不佳。它能够挑选出网格,如下图红色所示: 输入图片这里的描述

但是,它无法完全挑选出形成垂直/水平线的所有点,如果我放宽 epsilon 的参数,它会错误地将更多点分类为噪声(例如左下角)的 图片)。

我想知道是否有 DBSCAN 的修改模型使用椭圆而不是圆形?或者为此推荐的任何其他不需要预先指定聚类数量的聚类方法?

或者有没有更好的方法来识别这些构成网格的点?任何帮助表示赞赏。

I would like to implement an unsupervised clustering to detect grids (vertical/horizontal lines) for spatial points.

I have tried DBSCAN and it gives subpar results. It is able to pick out the grids as seen in red below:
enter image description here

However, it is not able to completely pick out all the points that form the vertical/horizontal lines and if i relax the parameters of epsilon, it will incorrectly classify more points as noisy (e.g. the bottom left of the picture).

I was wondering if maybe there is a modification model of DBSCAN that uses ellipse instead of circles? Or any other clustering methods recommended for this that does not need to prespecify the number of clusters?

Or is there a better method to identify these points that make the grid? Any help is appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

迷爱 2025-01-16 05:42:22

您可以通过以下方式修改数据来使用各向异性 DBSCAN:各向异性值 >1 将找到垂直聚类,值 <1 将找到水平聚类。

from sklearn.cluster import DBSCAN
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db

这是一个完整的数据示例:

import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
    n_samples=750, centers=centers, cluster_std=0.4, random_state=0
)

print(X.shape)
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db


db = anisotropical_DBSCAN(X, anisotropy = 0.1, eps = 0.1, min_samples = 10)

core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)


# #############################################################################
# Plot result
import matplotlib.pyplot as plt

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 1]

    class_member_mask = labels == k

    xy = X[class_member_mask & core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=14,
    )

    xy = X[class_member_mask & ~core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=6,
    )

plt.title("Estimated number of clusters: %d" % n_clusters_)

您得到垂直集群:
输入图片此处描述

现在将参数更改为db = anisotropical_DBSCAN(X, anisotropy = 10, eps = 1, min_samples = 10)我必须更改 eps 值,因为水平比例和垂直比例不同,但在您的情况下,您应该能够保持相同的 (eps, min example) 来检测线条

并且您会得到水平集群:
输入图片这里的描述

还有各向异性 DBSCAN 的实现,它们可能更清晰https://github.com/gissong/ADCN

You can use an anisotropical DBSCAN by modifying your data this way : value of anisotropy >1 will find vertical clusters and values <1 will find horizontal clusters.

from sklearn.cluster import DBSCAN
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db

Here is a full example with data :

import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
    n_samples=750, centers=centers, cluster_std=0.4, random_state=0
)

print(X.shape)
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db


db = anisotropical_DBSCAN(X, anisotropy = 0.1, eps = 0.1, min_samples = 10)

core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)


# #############################################################################
# Plot result
import matplotlib.pyplot as plt

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 1]

    class_member_mask = labels == k

    xy = X[class_member_mask & core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=14,
    )

    xy = X[class_member_mask & ~core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=6,
    )

plt.title("Estimated number of clusters: %d" % n_clusters_)

You get vertical clusters :
enter image description here

Now change the parameters to db = anisotropical_DBSCAN(X, anisotropy = 10, eps = 1, min_samples = 10) I had to change eps value because the horizontal scale and vertical scale arent the same, but in your case, you should be able to keep the same (eps, min sample) for detecting lines

And you get horizontal clusters :
enter image description here

There are also implementations of anisotropical DBSCAN that are probably a lot cleaner https://github.com/gissong/ADCN

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文