比较数据框中多行中的两列

发布于 2024-12-08 10:50:50 字数 507 浏览 0 评论 0原文

我正在使用一个数据框，我想在其中将数据点 Genotype 与两个引用 S288C 和 SK1 进行比较。此比较将在数据帧的许多行（100+）中进行。以下是我的数据框的前几行：

    Assay   Genotype S288C SK1
1   CCT6-002     G     A    G
2   CCT6-007     G     A    G
3   CCT6-013     C     T    C
4   CCT6-015     G     A    G
5   CCT6-016     G     G    T

作为最终产品，我想要一个由 1 (S288C) 和 0 (SK1) 组成的字符串，具体取决于哪个数据点匹配的参考文献。因此，在上面的示例中，我希望输出 00001，因为除了最后一个匹配 SK1 之外的所有输出。

原文

I have a data frame that I'm working with in which I'd like to compare a data point Genotype with two references S288C and SK1. This comparison will be done across many rows (100+) of the data frame. Here are the first few lines of my data frame:

    Assay   Genotype S288C SK1
1   CCT6-002     G     A    G
2   CCT6-007     G     A    G
3   CCT6-013     C     T    C
4   CCT6-015     G     A    G
5   CCT6-016     G     G    T

As a final product, I'd like a character string of 1's (S288C) and 0's (SK1) depending on which of the references the data point matches. Thus in the example above I'd like an output of 00001 since all except the last match SK1.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

英雄似剑 2024-12-15 10:50:50

嵌套的 ifelse 应该可以做到这一点（查看 help(ifelse) 的用法）：

ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))

通过此测试数据：

> dat
     Genotype S288C SK1
[1,] "G"      "A"   "G"
[2,] "G"      "A"   "G"
[3,] "C"      "T"   "C"
[4,] "G"      "A"   "G"
[5,] "G"      "G"   "T"
[6,] "G"      "A"   "A"

我们得到：（

> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1]  0  0  0  0  1 NA

注意： 如果您在使用此功能时遇到困难，您需要确保列是向量，并且不会被 R 视为因子...一个简单的 for 循环应该可以做到这一点： for (i in 1:ncol(dat)){dat[,i]=as.vector(dat[,i])})。

A nested ifelse should do it (take a look at help(ifelse) for usage):

ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))

With this test data:

> dat
     Genotype S288C SK1
[1,] "G"      "A"   "G"
[2,] "G"      "A"   "G"
[3,] "C"      "T"   "C"
[4,] "G"      "A"   "G"
[5,] "G"      "G"   "T"
[6,] "G"      "A"   "A"

We get:

> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1]  0  0  0  0  1 NA

(Note: If you have trouble using this, you'll want to make sure that the columns are vectors, and are not treated by R as factors...a simple for loop should do it: for (i in 1:ncol(dat)){dat[,i]=as.vector(dat[,i])}).

回复收藏 0 原文

~没有更多了~