r 中约 10 个因素的簇共存饼图

发布于 2024-12-13 01:22:41 字数 276 浏览 0 评论 0原文

我有一个两列数据集，其中包含大约 30000 个聚类和 10 个因子，如下所示：

cluster-1 Factor1
cluster-1 Factor2
...
cluster-2 Factor2
cluster-2 Factor3
...

我想表示聚类集中因子的共现。类似于“1234 个簇中的因子 1+因子 3+因子 5”，等等不同的组合。我以为我可以做一些像饼图这样的东西，但是有 10 个因素，我认为可能有太多的组合。

表示这一点的好方法是什么？

原文

I've got a two-column dataset with about 30000 clusters and 10 factors like this:

cluster-1 Factor1
cluster-1 Factor2
...
cluster-2 Factor2
cluster-2 Factor3
...

And I would like to represent the co-occurrence of factors in the clusterset. Something like "Factor1+Factor3+Factor5 in 1234 clusters", and so on for the different combinations. I thought I could so something like a pie chart, but with 10 factors, I take there can be too many combinations.

What would be a good way of representing this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泅渡 2024-12-20 01:22:41

这里有一个很好的编程问题需要解决：

如何计算不同簇中因子同时出现的数量？

首先模拟一些数据：

n = 1000

set.seed(12345)
n.clusters = 100
clusters = rep(1:n.clusters, length.out=n)

n.factors = 10
factors = round(rnorm(n, n.factors/2, n.factors/5))
factors[factors > n.factors] = n.factors
factors[factors < 1] = 1

data = data.frame(cluster=clusters, factor=factors)

> data
  cluster factor
1       1      6
2       2      6
3       3      5
4       4      4
5       5      6
6       6      1
...

然后下面的代码可用于将每个因素组合在集群中出现的次数制成表格：

counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse=''))))

这可以表示为简单的饼图，例如

dev.new(width=5, height=5)
pie(counts[counts>1])

输入图像此处的描述

但是像这样的简单计数通常最有效地显示为排序表。有关详细信息，请查看 Edward Tufte。

There is one good programming question in here that should be addressed:

How do I count the number of co-occurrences of factors in the different clusters?

First simulate some data:

n = 1000

set.seed(12345)
n.clusters = 100
clusters = rep(1:n.clusters, length.out=n)

n.factors = 10
factors = round(rnorm(n, n.factors/2, n.factors/5))
factors[factors > n.factors] = n.factors
factors[factors < 1] = 1

data = data.frame(cluster=clusters, factor=factors)

> data
  cluster factor
1       1      6
2       2      6
3       3      5
4       4      4
5       5      6
6       6      1
...

Then here is the code that could be used to tabulate the number of times each combination of factors occurs in the clusters:

counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse=''))))

This can be represented as a simple pie chart, for example,

dev.new(width=5, height=5)
pie(counts[counts>1])

enter image description here

but simple counts like this are often most efficiently displayed as a sorted table. For more on this, check out Edward Tufte.

回复收藏 0 原文

~没有更多了~

关于作者

冷了相思

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

r 中约 10 个因素的簇共存饼图

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

r 中约 10 个因素的簇共存饼图

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。