使用 if 语句循环应用函数的数据帧行
我是 R 新手,如果要求和的两个元素都满足给定条件,我将尝试对给定数据帧的 2 列求和。为了清楚起见,我想要做的是:
> t.d<-as.data.frame(matrix(1:9,ncol=3))
> t.d
V1 V2 V3
1 4 7
2 5 8
3 6 9
> t.d$V4<-rep(0,nrow(t.d))
> for (i in 1:nrow(t.d)){
+ if (t.d$V1[i]>1 && t.d$V3[i]<9){
+ t.d$V4[i]<-t.d$V1[i]+t.d$V3[i]}
+ }
> t.d
V1 V2 V3 V4
1 4 7 0
2 5 8 10
3 6 9 0
我需要一个高效的代码,因为我的真实数据框有大约 150000 行和 200 列。这会出现错误:
t.d$V4<-t.d$V1[t.d$V1>1]+ t.d$V3[t.d$V3>9]
“应用”是一个选项吗?我尝试了这个:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d$V4<-rep(0,nrow(t.d))
my.fun<-function(x,y){
if(x>1 && y<9){
x+y}
}
t.d$V4<-apply(X=t.d,MAR=1,FUN=my.fun,x=t.d$V1,y=t.d$V3)
但它也给出了一个错误。 非常感谢您的帮助。
I'm new to R and I'm trying to sum 2 columns of a given dataframe, if both the elements to be summed satisfy a given condition. To make things clear, what I want to do is:
> t.d<-as.data.frame(matrix(1:9,ncol=3))
> t.d
V1 V2 V3
1 4 7
2 5 8
3 6 9
> t.d$V4<-rep(0,nrow(t.d))
> for (i in 1:nrow(t.d)){
+ if (t.d$V1[i]>1 && t.d$V3[i]<9){
+ t.d$V4[i]<-t.d$V1[i]+t.d$V3[i]}
+ }
> t.d
V1 V2 V3 V4
1 4 7 0
2 5 8 10
3 6 9 0
I need an efficient code, as my real dataframe has about 150000 rows and 200 columns. This gives an error:
t.d$V4<-t.d$V1[t.d$V1>1]+ t.d$V3[t.d$V3>9]
Is "apply" an option? I tried this:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d$V4<-rep(0,nrow(t.d))
my.fun<-function(x,y){
if(x>1 && y<9){
x+y}
}
t.d$V4<-apply(X=t.d,MAR=1,FUN=my.fun,x=t.d$V1,y=t.d$V3)
but it gives an error as well.
Thanks very much for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我将参与并提供另一个版本。由于如果条件不符合您想要零,并且 TRUE/FALSE 是 1/0 的美化版本,因此简单地乘以条件也可以:
...而且它恰好比其他解决方案更快;-)
I'll chip in and provide yet another version. Since you want zero if the condition doesn't mach, and TRUE/FALSE are glorified versions of 1/0, simply multiplying by the condition also works:
...and it happens to be faster than the other solutions ;-)
此操作不需要循环、apply 语句或 if 语句。您只需要矢量化操作和子集化:
为什么这有效?
在第一步中,我创建一个新列,它是 V1 列和 V4 列的直接和。我使用
within
作为引用df
列的便捷方式,而无需始终编写df$V
。在第二步中,我对不满足条件的所有行进行子集化,并将这些行的 V4 设置为 0。
This operation doesn't require loops, apply statements or if statements. Vectorised operations and subsetting is all you need:
Why does this work?
In the first step I create a new column that is the straight sum of columns V1 and V4. I use
within
as a convenient way of referring to the columns ofd.f
without having to writed.f$V
all the time.In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.
ifelse
是你的朋友:ifelse
is your friend here: