r计数器不计数整数(0)

发布于 2025-02-03 03:50:22 字数 1557 浏览 3 评论 0原文

我一直在尝试解决这个问题,但无法成功尝试任何尝试,在互联网上找到的解决方案也无法正常工作。

我有超过500k行的这类数据集。 “此类数据集”

示例子集:

subset= as.data.frame(matrix(c(9,9,9,0,2,9,0,9,9,1,0,2,9,9,9,0,0,0,2,2,2,1,1,1),ncol = 3, byrow = T))

每个列都是一个单独的,每个行都是一个特定标记,带有“ 0,1,”, 2“这意味着它并不缺少该行的数据(当然具有其他含义,但在这里没有必要解释)和“ 9”意味着它缺少该行的数据。 我将与引号一样写数字,以保持清晰的查看,但是数据集中是数字。

我要做的是计算至少一个示例中的一个行不丢失。 因此,在所有由“ 9” s组成的行中,计数器不会增加。如果至少一个单元格在某个行中不是9个,则计数器会增加一个。

尝试了一段时间后,我写了此代码:

counter=0

test = apply(subset, 1,  function(i) {
  if(length(which(subset[i,] !=9)) != 0){
    counter=counter+1
  }
  print(counter)
  assign("counter",counter,envir = .GlobalEnv)
})

当我这样做时,当唯一的单元格时,计数器不会增加/或不是“ 9”的单元格是整数(0)。例如,在我上传的图片中,第9行 “” 由许多“ 9” S和一个整数组成(0)。计数器不会在这一行中增加,但我也必须计算出来。

为了克服这一点,我尝试了不同的事情,包括:

1-放置相同的长度(length(whe(虚拟[i,] == 0)),integer(0))all() ,并且尝试了各种如果其他语句。我还尝试了各种我不记得的方式,试图计算整数(0)。

2-将9个更改为na /更改整数(0),例如3。这都改变了环的机理,现在无论行中的单元格中如何,计数器都会增加一个。

3-使用如果有条件(条件< 9*ncol(subset)),我认为这会给出结果(如果不丢失任何一个/9它将小于9*ncol),但再次将其视为整数(0),没有任何变化。

4-尝试查找结果在哪里“零”无法使用,因为我在开始中编写的代码给出了丢失的数据“ 9” s(零)的结果相同。我只希望丢失的结果在柜台外。

如果有人可以在此问题上提供帮助,这将不胜感激。由于Stackoverflow希望使评论部分从感谢消息中保持清洁,因此我想提前感谢所有人。

I have been trying to solve this issue but couldn't succeed whatever I tried, the solutions found on internet and this site didn't work either.

I have these kind of datasets with more than 500k rows.
these kind of datasets

Example subset:

subset= as.data.frame(matrix(c(9,9,9,0,2,9,0,9,9,1,0,2,9,9,9,0,0,0,2,2,2,1,1,1),ncol = 3, byrow = T))

Every column is an individual, every row is a certain marker, with "0,1,2" meaning it is not missing data for that row (of course with other meanings but not necessary here to explain) and "9" meaning it is missing data for that row. I am going to write numbers as with quotation marks to keep it clear to see, but it is numeral in the dataset.

What I am trying to do is counting the rows where at least one of the samples is not missing. So, in the rows where it is all consisted of "9"s, the counter will not increase. If at least one cell is not 9 in that certain row, the counter will increase by one.

After trying for some time, I wrote this code:

counter=0

test = apply(subset, 1,  function(i) {
  if(length(which(subset[i,] !=9)) != 0){
    counter=counter+1
  }
  print(counter)
  assign("counter",counter,envir = .GlobalEnv)
})

When I do this, the counter doesn't increase when the only cell/or cells that are not "9" are integer(0). For example, in the picture I uploaded, the 9th row consists of many "9"s and an integer(0). The counter won't increase in this row but I have to count it, too.

In order to overcome this, I tried different things including;

1- Placing identical(length(which(dummy[i,] ==0)), integer(0)) , all() functions in various places in the loop, and tried various if else statements. I also tried various ways that I don't remember all, trying to count integer(0).

2- Changing 9's into NA / changing integer(0)'s into another number such as 3. These both changed the mechanism of the loop, and now regardless of the cells in the row, the counter increases by one.

3- Using the if conditional with ( condition < 9*ncol(subset) ), which I thought would give the result (if any of them is not missing/9 it will be less than 9*ncol), but again R sees it as integer(0) and nothing changes.

4- Trying to find where the result is "zero" won't work because the code I wrote in the beginning gives the same result for the missing data "9"s as well (zero). I only want the missing results out of the counter.

If anybody can help regarding this issue, it will be highly appreciated. As stackoverflow wants to keep comment section clean from thank messages, I want to say thanks to everybody in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鹿童谣 2025-02-10 03:50:22

据我所知,您想计算至少有一个值不同的行数的行数。有很多方法可以做到这两个替代方法。

使用dplyr,

您可以使用dplyr这样做:

library(dplyr)

# Your provided data
subset %>% 
  filter(if_any(everything(), ~ .x != 9)) %>% 
  nrow()
#> [1] 6

在2022-05-29上由 reprex软件包(v2.0.1)

in filter(if_any(avethert(),〜.x!= 9)))filter()删除至少一个值不等于9的行。之后,我们只计算行。

使用apply()

如果要使用应用程序,则可以执行以下操作:

sum(
  apply(
    subset, 
    MARGIN = 1, 
    function(x) {
      any(x != 9)
    }
  )
)
#> [1] 6

在2022-05-29上创建的 preprex package (v2.0.1)

在此处,我在每一行subset with apply> apply()和检查该行的任何值是否与9不相等。这返回true/false的向量。我们sum()该向量以找到至少一个值不同的行总数。

As I understand, you want to count the number of rows where there is at least one value different to 9. There are many ways of doing this, under are two alternatives.

With dplyr

You can do this with dplyr like this:

library(dplyr)

# Your provided data
subset %>% 
  filter(if_any(everything(), ~ .x != 9)) %>% 
  nrow()
#> [1] 6

Created on 2022-05-29 by the reprex package (v2.0.1)

In filter(if_any(everything(), ~ .x != 9)), filter() removes the rows where at least one value is not equal to 9. After, we just count the rows.

With apply()

If you want to use apply you can do the following:

sum(
  apply(
    subset, 
    MARGIN = 1, 
    function(x) {
      any(x != 9)
    }
  )
)
#> [1] 6

Created on 2022-05-29 by the reprex package (v2.0.1)

Here, I iterate over each row of subset with apply() and check whether any values of that row is unequal to 9. This returns a vector of TRUE/FALSE. We sum() this vector to find the total number of rows with at least one value different to 9.

人生戏 2025-02-10 03:50:22

这是我最容易理解的选择。您可以根据其他变量创建一个带有值的附加列计数器case_when函数检查列的值,如果找到9,则在Counter列中放置0。如果在您的任何列中找不到9个,它将返回1。然后,您可以将Counter列总结以检查没有九个的行的总数。

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    V1 == 9 ~ 0,
    V2 == 9 ~ 0,
    V3 == 9 ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)

如果您确定理解基本版本,则可以缩短它,因此您不必命名所有列。

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    if_any(.fns = ~ .x == 9) ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)

That's the option I find the easiest to understand. You can create an additional column counter with value based on the other variables. The case_when function checks values of your columns and if it finds a 9, it puts a 0 in the counter column. If it doesn't find a 9 in any of your columns, it returns a 1. You can then sum your counter column to check the overall number of rows without nines.

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    V1 == 9 ~ 0,
    V2 == 9 ~ 0,
    V3 == 9 ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)

If you're sure you understand the basic version, you can shorten it so you don't have to name all of your columns.

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    if_any(.fns = ~ .x == 9) ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文