迭代 R 中的所有行，删除符合条件的行

发布于 2024-12-31 22:22:37 字数 956 浏览 3 评论 0原文

R 数据框。它有大约十几列和大约150行。我想迭代每一行并将其删除，在这两种情况下，

第 8 列中的值未定义
。第 8 列中其上方行的值已定义。

我的代码看起来像这样，但它总是崩溃。这一定是一个愚蠢的错误，但我无法弄清楚。

for (i in 2:nrow(newfile)){
    if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]){ 
    newfile<-newfile[-i,]
    }
}

显然在这个例子中，newfile 是我的数据框。

我得到的错误

[.data.frame(newfile, -i, ) 中出现错误：未找到对象“i”

问题已解决，但如果你们想乱搞的话，还有一些测试数据：

23  L8  29141078    744319  27165443
24  L8  27165443    NA  NA
25  L8  28357836    8293    25116398
26  L8  25116398    NA  NA
27  L8  28357836    21600   25116398
28  L8  25116398    NA  NA
29  L8  40929564    NA  NA
30  L8  40929564    NA  NA
31  L8  41917264    33234   39446503
32  L8  39446503    NA  NA
33  L8  41917264    33981   39446503
34  L8  39446503    NA  NA

显然这里做了一些修改，所以现在您正在将第 4 列与其上方的列进行比较（或者您可以使用第 5 列，无论哪种方式）

原文

R data frame. It has about a dozen columns and 150 or so rows. I want to iterate through each row and remove it, under these two conditions

It's value in column 8 is undefined
The value for the row ABOVE it, in column 8 IS defined.

My code looks like this, but it keeps crashing. It's gotta be a dumb mistake, but I can't figure it out.

for (i in 2:nrow(newfile)){
    if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]){ 
    newfile<-newfile[-i,]
    }
}

Obviously in this example, newfile is my dataframe.

The error I get

Error in [.data.frame(newfile, -i, ) : object 'i' not found

Problem solved, but some test data if you guys wanted to muck around:

23  L8  29141078    744319  27165443
24  L8  27165443    NA  NA
25  L8  28357836    8293    25116398
26  L8  25116398    NA  NA
27  L8  28357836    21600   25116398
28  L8  25116398    NA  NA
29  L8  40929564    NA  NA
30  L8  40929564    NA  NA
31  L8  41917264    33234   39446503
32  L8  39446503    NA  NA
33  L8  41917264    33981   39446503
34  L8  39446503    NA  NA

Obviously a little modified here, so now you are comparing column 4 with the one above it (or you can use column 5, either way)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

叫思念不要吵 2025-01-07 22:22:37

问题是你正在改变你自己的数据框架； nrow(newfile) 的原始评估不会随着您的进行而更新（如果您有一个 C 样式循环 for (i=1; i<=nrow(newfile) ); i++) ...)。另一方面，在 while 循环中，每次循环都会重新评估条件，所以我认为这会起作用。

i <- 2
while (i<=nrow(newfile)){
   if (is.na(newfile[i,8]) && !is.na(newfile[i-1,8])) { 
     newfile<-newfile[-i,]
   }
   i <- i+1
}

您没有给我们一个易于重现的答案（即带有答案的测试数据集），所以我现在不打算对此进行测试。

仔细思考（我现在没有时间给出这个）可能会导致一种非迭代（因此如果重要的话可能会更快）的方法来做到这一点。

The problem is that you're changing the data frame out from under yourself; the original evaluation of nrow(newfile) doesn't get updated as you go along (it would if you had a C-style loop for (i=1; i<=nrow(newfile); i++) ...). In a while loop, on the other hand, the condition will get re-evaluated every time through the loop, so I think this will work.

i <- 2
while (i<=nrow(newfile)){
   if (is.na(newfile[i,8]) && !is.na(newfile[i-1,8])) { 
     newfile<-newfile[-i,]
   }
   i <- i+1
}

You didn't give us an easily reproducible answer (i.e. a test dataset with answers), so I'm not going to test this right now.

Careful thought (which I don't have time to give this at the moment) might lead to a non-iterative (and hence perhaps very much faster, if that matters) way to do this.

回复收藏 0 原文

陌若浮生 2025-01-07 22:22:37

嗯，如果我这样做，我会得到

Error in if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]) { : 
  missing value where TRUE/FALSE needed

这是因为你在迭代行时删除了行，所以当你到达 nrow(newfile) （这是原始的行数）行，因为 nrow(newfile) 在 foo 循环开始时计算一次），它可能不再存在，因为行已被删除。

您可以通过构造要保留的行的逻辑索引（即长度为 nrow(newfile) 的向量，如果您想保留行和 TRUE，则可以完全避免循环。 >FALSE 否则）：

n <- nrow(newfile)
# first bit says "is the row NA (for rows 2:n)"
# second bit says "is the row above *not* NA (for rows 1:(n-1))
# the & finds rows satisfying *both* conditions (first row always gets kept)
toRemove <- c(FALSE,is.na(newfile[-1,8])) & c(FALSE,!is.na(newfile[-n,8]))
toKeep   <- !toRemove
newfile  <- newfile[toKeep,]

如果您喜欢的话，您可以在一行中完成所有操作：

newfile <- newfile[ !(c(FALSE,is.na(newfile[-1,8])) & c(FALSE,!is.na(newfile[-nrow(newfile),8]))), ]

Hmm, if I do this, I get

Error in if (is.na(newfile[i,8]) && !is.na(newfile[(i-1),8]) { : 
  missing value where TRUE/FALSE needed

This is because you're removing rows while you're iterating through them, so by the time you get to nrow(newfile) (which is the original number of rows, since the nrow(newfile) is evaluated once at the beginning of the foor loop), it may not exist any more because rows have been removed.

You can avoid looping altogether by constructing a logical index of which rows to keep (ie vector of length nrow(newfile) with TRUE if you want to keep the row and FALSE otherwise):

n <- nrow(newfile)
# first bit says "is the row NA (for rows 2:n)"
# second bit says "is the row above *not* NA (for rows 1:(n-1))
# the & finds rows satisfying *both* conditions (first row always gets kept)
toRemove <- c(FALSE,is.na(newfile[-1,8])) & c(FALSE,!is.na(newfile[-n,8]))
toKeep   <- !toRemove
newfile  <- newfile[toKeep,]

You could do it all in one line if that's your thing:

newfile <- newfile[ !(c(FALSE,is.na(newfile[-1,8])) & c(FALSE,!is.na(newfile[-nrow(newfile),8]))), ]

回复收藏 0 原文

带上头具痛哭 2025-01-07 22:22:37

这是另一个解决方案。但如果先前的值也是 NA，则它会保留 NA 值。

#create some dummy data
newfile <- matrix(runif(800), ncol = 8)
newfile[rbinom(100, 1, 0.25) == 1, 8] <- NA
#the selection
newfile[-which(diff(is.na(newfile[, 8])) == 1) - 1, ]

Here is another solution. But it keeps NA values if the previous value is also NA.

#create some dummy data
newfile <- matrix(runif(800), ncol = 8)
newfile[rbinom(100, 1, 0.25) == 1, 8] <- NA
#the selection
newfile[-which(diff(is.na(newfile[, 8])) == 1) - 1, ]

回复收藏 0 原文

~没有更多了~