如何创建“聚类图”阴谋 ? (在R中)

发布于 2024-09-05 16:39:32 字数 597 浏览 6 评论 0原文

我遇到了这个有趣的网站,它提出了一种可视化聚类算法的想法,称为“ Clustergram":

“替代文本”"
(来源:schonlau.net

我不确定这确实很有用,但为了使用它,我想用 R 重现它,但我不确定如何去做。

如何为每个项目创建一条线,以便它在不同数量的集群中保持一致?

这是一个示例代码/数据,可用于获取潜在答案:

hc <- hclust(dist(USArrests), "ave")
plot(hc)

I came across this interesting website, with an idea of a way to visualize a clustering algorithm called "Clustergram":

alt text
(source: schonlau.net)

I am not sure how useful this really is, but in order to play with it I would like to reproduce it with R, but am not sure how to go about doing it.

How would you create a line for each item so it would stay consistent throughout the different number of clusters?

Here is an example code/data to play with for potential answer:

hc <- hclust(dist(USArrests), "ave")
plot(hc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

無心 2024-09-12 16:39:32

更新:我发布了一个解决方案,其中包含冗长的示例和讨论此处。 (它基于我下面给出的代码)。此外,Hadley 非常友善,并提供了代码的 ggplot2 实现。

这是一个基本的解决方案(为了更好的解决方案,请查看上面的“更新”):

set.seed(100)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
              matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(Data) <- c("x", "y")

# noise <- runif(100,0,.05)
line.width <- rep(.004, dim(Data)[1])
Y <- NULL
X <- NULL
k.range <- 2:10

plot(0, 0, col = "white", xlim = c(1,10), ylim = c(-.5,1.6),
     xlab = "Number of clusters", ylab = "Clusters means", 
     main = "(Basic) Clustergram")
axis(side =1, at = k.range)
abline(v = k.range, col = "grey")

centers.points <- list()

for(k in k.range){
    cl <- kmeans(Data, k)

    clusters.vec <- cl$cluster
    the.centers  <- apply(cl$centers,1, mean)

    noise <- unlist(tapply(line.width, clusters.vec, 
                           cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
    noise <- noise - mean(range(noise))
    y <- the.centers[clusters.vec] + noise
    Y <- cbind(Y, y)
    x <- rep(k, length(y))
    X <- cbind(X, x)

    centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k)) 
#   points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
}

require(colorspace)
COL <- rainbow_hcl(100)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

# add points
lapply(centers.points, 
       function(xx){ with(xx,points(y~x, pch = 19, col = "red", cex = 1.3)) })

在此处输入图像描述

Update: I posted a solution with a lengthy example and discussion here. (it is based on the code I gave bellow). Also, Hadley was very kind and offered a ggplot2 implementation of the code.

Here is a basic solution (for a better one, look at the "update" above):

set.seed(100)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
              matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(Data) <- c("x", "y")

# noise <- runif(100,0,.05)
line.width <- rep(.004, dim(Data)[1])
Y <- NULL
X <- NULL
k.range <- 2:10

plot(0, 0, col = "white", xlim = c(1,10), ylim = c(-.5,1.6),
     xlab = "Number of clusters", ylab = "Clusters means", 
     main = "(Basic) Clustergram")
axis(side =1, at = k.range)
abline(v = k.range, col = "grey")

centers.points <- list()

for(k in k.range){
    cl <- kmeans(Data, k)

    clusters.vec <- cl$cluster
    the.centers  <- apply(cl$centers,1, mean)

    noise <- unlist(tapply(line.width, clusters.vec, 
                           cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
    noise <- noise - mean(range(noise))
    y <- the.centers[clusters.vec] + noise
    Y <- cbind(Y, y)
    x <- rep(k, length(y))
    X <- cbind(X, x)

    centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k)) 
#   points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
}

require(colorspace)
COL <- rainbow_hcl(100)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

# add points
lapply(centers.points, 
       function(xx){ with(xx,points(y~x, pch = 19, col = "red", cex = 1.3)) })

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文