R:pmatch 完成更困难的任务

发布于 2024-11-19 01:46:06 字数 1960 浏览 2 评论 0原文

谢谢@nullglob,

我尝试再次运行它,但我的输出不同。如果我滥用了你的代码,你介意教我吗?抱歉,我可能误解了它的工作方式。我希望你不介意给我更多的建议。

 df1 <- data.frame(
    A=c("x01","x02","y03","z02","x04", "x33", "z03"),
    B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))




 df2 <- data.frame(
    X=c("a","b","c","d","e", "f"),
    Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))





with(c(df1,df2),{
   i <- pmatch(Y,B)
   iunmatched <- which(is.na(i))
   nunmatched <- length(iunmatched)
   nexcess <- length(B) - length(X)
   data.frame(A = c(A,rep(NA,nunmatched)),
              B = c(B,rep(NA,nunmatched)),
              X = c(X[i],rep(NA,nexcess),X[iunmatched]),
              Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))  })

       A  B  X  Y
    1  1  1  1  1
    2  2  2  2  2
    3  5  5  3  5
    4  6  3  4  3
    5  3  4  5  4
    6  4  6 NA NA
    7  7  7 NA NA
    8 NA NA  6  6

=====================原始问题=====

感谢您对我上一个问题的回答。 (http://stackoverflow.com/q/6592214/602276)

为了建立在这个答案的基础上,我想为更困难的任务进行 pmatch 。

df1 <- data.frame(
  A=c("x01","x02","y03","z02","x04", "x33", "z03")
  B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33     yyy
7 z03     zzz

我的 df2 修改如下:

df2 <- data.frame(
  X=c("a","b","c","d","e", "f"),
  Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX
6 f   xxx

困难是由于 df1 和 df2 的行数不同,我无法在正确的开头进行 cbind

Morover,df1 和 df2 之间存在一些不匹配,它们相应的行应该相应地导致 NA 。

预期输出如下:

   A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX
6 x33     yyy   NA  NA
7 z03     zzz   NA  NA
7 NA      NA    f   xxx

你介意教我如何用 R 来做这件事吗?多谢。

Thanks @nullglob,

I tried to run it again, but my output is different. Could you mind to teach me if I have misuse your code? Sorry that I may have misunderstand the way how it works. I hope you don't mind to give me some more advice.

 df1 <- data.frame(
    A=c("x01","x02","y03","z02","x04", "x33", "z03"),
    B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))




 df2 <- data.frame(
    X=c("a","b","c","d","e", "f"),
    Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))





with(c(df1,df2),{
   i <- pmatch(Y,B)
   iunmatched <- which(is.na(i))
   nunmatched <- length(iunmatched)
   nexcess <- length(B) - length(X)
   data.frame(A = c(A,rep(NA,nunmatched)),
              B = c(B,rep(NA,nunmatched)),
              X = c(X[i],rep(NA,nexcess),X[iunmatched]),
              Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))  })

       A  B  X  Y
    1  1  1  1  1
    2  2  2  2  2
    3  5  5  3  5
    4  6  3  4  3
    5  3  4  5  4
    6  4  6 NA NA
    7  7  7 NA NA
    8 NA NA  6  6

======================ORIGINAL Question=====

Thanks for answers to my previous question. (http://stackoverflow.com/q/6592214/602276)

To build upon this answer, I want to do the pmatch for a more difficult task.

df1 <- data.frame(
  A=c("x01","x02","y03","z02","x04", "x33", "z03")
  B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33     yyy
7 z03     zzz

My df2 is modified as follows:

df2 <- data.frame(
  X=c("a","b","c","d","e", "f"),
  Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX
6 f   xxx

The difficulty is due to df1 and df2 has different no of rows, i cannot do cbind at the right beginning

Morover, there is some mismatch between df1 and df2, their corresponding line should results NA accordingly.

The expected output is as follows:

   A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX
6 x33     yyy   NA  NA
7 z03     zzz   NA  NA
7 NA      NA    f   xxx

Could you mind to teach me how to do it with R? Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人│生佛魔见 2024-11-26 01:46:06

这不完全是一个优雅的解决方案,但它似乎可以解决问题:

with(c(df1,df2),{
  i <- pmatch(Y,B)
  iunmatched <- which(is.na(i))
  nunmatched <- length(iunmatched)
  nexcess <- length(B) - length(X)
  data.frame(A = c(A,rep(NA,nunmatched)),
             B = c(B,rep(NA,nunmatched)),
             X = c(X[i],rep(NA,nexcess),X[iunmatched]),
             Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))
})

输出应该是:

     A       B    X     Y
1  x01 A01BB01    a A01BB
2  x02 A02BB02    b   A02
3  y03 C02AA05    c  C02A
4  z02 B04CC10    d   B04
5  x04 C01GX02    e C01GX
6  x33     yyy <NA>  <NA>
7  z03     zzz <NA>  <NA>
8 <NA>    <NA>    f   xxx

This is not exactly an elegant solution, but it seems to do the trick:

with(c(df1,df2),{
  i <- pmatch(Y,B)
  iunmatched <- which(is.na(i))
  nunmatched <- length(iunmatched)
  nexcess <- length(B) - length(X)
  data.frame(A = c(A,rep(NA,nunmatched)),
             B = c(B,rep(NA,nunmatched)),
             X = c(X[i],rep(NA,nexcess),X[iunmatched]),
             Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))
})

The output should be:

     A       B    X     Y
1  x01 A01BB01    a A01BB
2  x02 A02BB02    b   A02
3  y03 C02AA05    c  C02A
4  z02 B04CC10    d   B04
5  x04 C01GX02    e C01GX
6  x33     yyy <NA>  <NA>
7  z03     zzz <NA>  <NA>
8 <NA>    <NA>    f   xxx
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文