R:pmatch 完成更困难的任务
谢谢@nullglob,
我尝试再次运行它,但我的输出不同。如果我滥用了你的代码,你介意教我吗?抱歉,我可能误解了它的工作方式。我希望你不介意给我更多的建议。
df1 <- data.frame(
A=c("x01","x02","y03","z02","x04", "x33", "z03"),
B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))
df2 <- data.frame(
X=c("a","b","c","d","e", "f"),
Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))
with(c(df1,df2),{
i <- pmatch(Y,B)
iunmatched <- which(is.na(i))
nunmatched <- length(iunmatched)
nexcess <- length(B) - length(X)
data.frame(A = c(A,rep(NA,nunmatched)),
B = c(B,rep(NA,nunmatched)),
X = c(X[i],rep(NA,nexcess),X[iunmatched]),
Y = c(Y[i],rep(NA,nexcess),Y[iunmatched])) })
A B X Y
1 1 1 1 1
2 2 2 2 2
3 5 5 3 5
4 6 3 4 3
5 3 4 5 4
6 4 6 NA NA
7 7 7 NA NA
8 NA NA 6 6
=====================原始问题=====
感谢您对我上一个问题的回答。 (http://stackoverflow.com/q/6592214/602276)
为了建立在这个答案的基础上,我想为更困难的任务进行 pmatch 。
df1 <- data.frame(
A=c("x01","x02","y03","z02","x04", "x33", "z03")
B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)
A B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33 yyy
7 z03 zzz
我的 df2 修改如下:
df2 <- data.frame(
X=c("a","b","c","d","e", "f"),
Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)
X Y
1 a A01BB
2 b A02
3 c C02A
4 d B04
5 e C01GX
6 f xxx
困难是由于 df1 和 df2 的行数不同,我无法在正确的开头进行 cbind
Morover,df1 和 df2 之间存在一些不匹配,它们相应的行应该相应地导致 NA 。
预期输出如下:
A B X Y
1 x01 A01BB01 a A01BB
2 x02 A02BB02 b A02
3 y03 C02AA05 c C02A
4 z02 B04CC10 d B04
5 x04 C01GX02 e C01GX
6 x33 yyy NA NA
7 z03 zzz NA NA
7 NA NA f xxx
你介意教我如何用 R 来做这件事吗?多谢。
Thanks @nullglob,
I tried to run it again, but my output is different. Could you mind to teach me if I have misuse your code? Sorry that I may have misunderstand the way how it works. I hope you don't mind to give me some more advice.
df1 <- data.frame(
A=c("x01","x02","y03","z02","x04", "x33", "z03"),
B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))
df2 <- data.frame(
X=c("a","b","c","d","e", "f"),
Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))
with(c(df1,df2),{
i <- pmatch(Y,B)
iunmatched <- which(is.na(i))
nunmatched <- length(iunmatched)
nexcess <- length(B) - length(X)
data.frame(A = c(A,rep(NA,nunmatched)),
B = c(B,rep(NA,nunmatched)),
X = c(X[i],rep(NA,nexcess),X[iunmatched]),
Y = c(Y[i],rep(NA,nexcess),Y[iunmatched])) })
A B X Y
1 1 1 1 1
2 2 2 2 2
3 5 5 3 5
4 6 3 4 3
5 3 4 5 4
6 4 6 NA NA
7 7 7 NA NA
8 NA NA 6 6
======================ORIGINAL Question=====
Thanks for answers to my previous question. (http://stackoverflow.com/q/6592214/602276)
To build upon this answer, I want to do the pmatch for a more difficult task.
df1 <- data.frame(
A=c("x01","x02","y03","z02","x04", "x33", "z03")
B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)
A B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33 yyy
7 z03 zzz
My df2 is modified as follows:
df2 <- data.frame(
X=c("a","b","c","d","e", "f"),
Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)
X Y
1 a A01BB
2 b A02
3 c C02A
4 d B04
5 e C01GX
6 f xxx
The difficulty is due to df1 and df2 has different no of rows, i cannot do cbind at the right beginning
Morover, there is some mismatch between df1 and df2, their corresponding line should results NA accordingly.
The expected output is as follows:
A B X Y
1 x01 A01BB01 a A01BB
2 x02 A02BB02 b A02
3 y03 C02AA05 c C02A
4 z02 B04CC10 d B04
5 x04 C01GX02 e C01GX
6 x33 yyy NA NA
7 z03 zzz NA NA
7 NA NA f xxx
Could you mind to teach me how to do it with R? Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这不完全是一个优雅的解决方案,但它似乎可以解决问题:
输出应该是:
This is not exactly an elegant solution, but it seems to do the trick:
The output should be: