数据框中的数据摘要

发布于 2024-12-10 14:01:54 字数 930 浏览 4 评论 0原文

我还有一个关于从我正在使用的大型数据框架中进行数据挖掘的问题，前几行如下：

      Assay   Genotype   Sample    Result
1     001        G         1         0
2     001        A         2         1
3     001        G         3         0 
4     001        NA        4         NA
5     002        T         1         0
6     002        G         2         1
7     002        T         3         0 
8     002        T         4         0
9     003        NA        1         NA
10    003        G         2         1
11    003        G         3         1 
12    003        T         4         0

总共我将处理 2000 个样本，每个样本进行 168 次分析。

我想根据这些数据生成一个汇总表，告诉我每个“结果”有多少“样本”。 “结果”只有 3 个选项：1、0 或 NA。我希望结果有一个如下所示的数据框（使用上述数据）：

Assay    1   0   NA
001      1   2   1 
002      1   3   0
003      2   1   1

正如我上面提到的，有 168 种不同的化验，它们不是简单地标记在数字系列中，因此必须从化验 ID 中提取原始数据框。在理想的情况下，我还希望看到数字旁边（或在不同的表中）列出的每个“结果”的样本百分比。

原文

I have another question concerning data mining from a large data frame that Im working with, the first few lines are as follows:

      Assay   Genotype   Sample    Result
1     001        G         1         0
2     001        A         2         1
3     001        G         3         0 
4     001        NA        4         NA
5     002        T         1         0
6     002        G         2         1
7     002        T         3         0 
8     002        T         4         0
9     003        NA        1         NA
10    003        G         2         1
11    003        G         3         1 
12    003        T         4         0

In total I'll be working with 2000 samples and 168 Assays for each sample.

Id like to generate a summary table from this data that tells me how many 'Samples' have each 'Result'. There are only 3 options for 'Result' 1, 0, or NA. I would like the result to have a data frame that looks like this (using the above data):

Assay    1   0   NA
001      1   2   1 
002      1   3   0
003      2   1   1

As I mentioned above there are 168 different Assays and they are not simply labeled in a numeric series, so the Assay ID must be extracted from the original data frame.
In an ideal world, I would also like to see a percentage of samples for each 'Result' listed next to the numbers (or in a different table).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

请止步禁区 2024-12-17 14:01:54

与 @MYaseen208 类似，但添加 NA 列：

> table(df[, c('Assay', 'Result')], useNA='ifany')
     Result
Assay 0 1 <NA>
    1 2 1    1
    2 3 1    0
    3 0 0    1

请参阅：?table

Like @MYaseen208 but adding NA column:

> table(df[, c('Assay', 'Result')], useNA='ifany')
     Result
Assay 0 1 <NA>
    1 2 1    1
    2 3 1    0
    3 0 0    1

See: ?table

回复收藏 0 原文

哑 2024-12-17 14:01:54

尝试

table(df$Assay, df$Result,useNA="ifany")

Try

table(df$Assay, df$Result,useNA="ifany")

回复收藏 0 原文

~没有更多了~

关于作者

听风吹

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

数据框中的数据摘要

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

数据框中的数据摘要

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。