迭代 R 中的所有行,删除符合条件的行
R 数据框。它有大约十几列和大约150行。我想迭代每一行并将其删除,在这两种情况下,
- 第 8 列中的值未定义
- 。第 8 列中其上方行的值已定义。
我的代码看起来像这样,但它总是崩溃。这一定是一个愚蠢的错误,但我无法弄清楚。
for (i in 2:nrow(newfile)){
if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]){
newfile<-newfile[-i,]
}
}
显然在这个例子中,newfile 是我的数据框。
我得到的错误
[.data.frame
(newfile, -i, ) 中出现错误:未找到对象“i”
问题已解决,但如果你们想乱搞的话,还有一些测试数据:
23 L8 29141078 744319 27165443
24 L8 27165443 NA NA
25 L8 28357836 8293 25116398
26 L8 25116398 NA NA
27 L8 28357836 21600 25116398
28 L8 25116398 NA NA
29 L8 40929564 NA NA
30 L8 40929564 NA NA
31 L8 41917264 33234 39446503
32 L8 39446503 NA NA
33 L8 41917264 33981 39446503
34 L8 39446503 NA NA
显然这里做了一些修改,所以现在您正在将第 4 列与其上方的列进行比较(或者您可以使用第 5 列,无论哪种方式)
R data frame. It has about a dozen columns and 150 or so rows. I want to iterate through each row and remove it, under these two conditions
- It's value in column 8 is undefined
- The value for the row ABOVE it, in column 8 IS defined.
My code looks like this, but it keeps crashing. It's gotta be a dumb mistake, but I can't figure it out.
for (i in 2:nrow(newfile)){
if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]){
newfile<-newfile[-i,]
}
}
Obviously in this example, newfile is my dataframe.
The error I get
Error in
[.data.frame
(newfile, -i, ) : object 'i' not found
Problem solved, but some test data if you guys wanted to muck around:
23 L8 29141078 744319 27165443
24 L8 27165443 NA NA
25 L8 28357836 8293 25116398
26 L8 25116398 NA NA
27 L8 28357836 21600 25116398
28 L8 25116398 NA NA
29 L8 40929564 NA NA
30 L8 40929564 NA NA
31 L8 41917264 33234 39446503
32 L8 39446503 NA NA
33 L8 41917264 33981 39446503
34 L8 39446503 NA NA
Obviously a little modified here, so now you are comparing column 4 with the one above it (or you can use column 5, either way)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
问题是你正在改变你自己的数据框架;
nrow(newfile)
的原始评估不会随着您的进行而更新(如果您有一个 C 样式循环for (i=1; i<=nrow(newfile) ); i++)
...)。另一方面,在 while 循环中,每次循环都会重新评估条件,所以我认为这会起作用。您没有给我们一个易于重现的答案(即带有答案的测试数据集),所以我现在不打算对此进行测试。
仔细思考(我现在没有时间给出这个)可能会导致一种非迭代(因此如果重要的话可能会更快)的方法来做到这一点。
The problem is that you're changing the data frame out from under yourself; the original evaluation of
nrow(newfile)
doesn't get updated as you go along (it would if you had a C-style loopfor (i=1; i<=nrow(newfile); i++)
...). In awhile
loop, on the other hand, the condition will get re-evaluated every time through the loop, so I think this will work.You didn't give us an easily reproducible answer (i.e. a test dataset with answers), so I'm not going to test this right now.
Careful thought (which I don't have time to give this at the moment) might lead to a non-iterative (and hence perhaps very much faster, if that matters) way to do this.
嗯,如果我这样做,我会得到
这是因为你在迭代行时删除了行,所以当你到达
nrow(newfile)
(这是原始的行数)行,因为nrow(newfile)
在 foo 循环开始时计算一次),它可能不再存在,因为行已被删除。您可以通过构造要保留的行的逻辑索引(即长度为
nrow(newfile)
的向量,如果您想保留行和TRUE
,则可以完全避免循环。 >FALSE 否则):如果您喜欢的话,您可以在一行中完成所有操作:
Hmm, if I do this, I get
This is because you're removing rows while you're iterating through them, so by the time you get to
nrow(newfile)
(which is the original number of rows, since thenrow(newfile)
is evaluated once at the beginning of the foor loop), it may not exist any more because rows have been removed.You can avoid looping altogether by constructing a logical index of which rows to keep (ie vector of length
nrow(newfile)
withTRUE
if you want to keep the row andFALSE
otherwise):You could do it all in one line if that's your thing:
这是另一个解决方案。但如果先前的值也是 NA,则它会保留 NA 值。
Here is another solution. But it keeps NA values if the previous value is also NA.