r计数器不计数整数(0)
我一直在尝试解决这个问题,但无法成功尝试任何尝试,在互联网上找到的解决方案也无法正常工作。
我有超过500k行的这类数据集。
示例子集:
subset= as.data.frame(matrix(c(9,9,9,0,2,9,0,9,9,1,0,2,9,9,9,0,0,0,2,2,2,1,1,1),ncol = 3, byrow = T))
每个列都是一个单独的,每个行都是一个特定标记,带有“ 0,1,”, 2“这意味着它并不缺少该行的数据(当然具有其他含义,但在这里没有必要解释)和“ 9”意味着它缺少该行的数据。 我将与引号一样写数字,以保持清晰的查看,但是数据集中是数字。
我要做的是计算至少一个示例中的一个行不丢失。 因此,在所有由“ 9” s组成的行中,计数器不会增加。如果至少一个单元格在某个行中不是9个,则计数器会增加一个。
尝试了一段时间后,我写了此代码:
counter=0
test = apply(subset, 1, function(i) {
if(length(which(subset[i,] !=9)) != 0){
counter=counter+1
}
print(counter)
assign("counter",counter,envir = .GlobalEnv)
})
当我这样做时,当唯一的单元格时,计数器不会增加/或不是“ 9”的单元格是整数(0)。例如,在我上传的图片中,第9行 由许多“ 9” S和一个整数组成(0)。计数器不会在这一行中增加,但我也必须计算出来。
为了克服这一点,我尝试了不同的事情,包括:
1-放置相同的长度(length(whe(虚拟[i,] == 0)),integer(0))
,all() ,并且尝试了各种
如果其他
语句。我还尝试了各种我不记得的方式,试图计算整数(0)。
2-将9个更改为na /更改整数(0),例如3。这都改变了环的机理,现在无论行中的单元格中如何,计数器都会增加一个。
3-使用如果
有条件(条件< 9*ncol(subset))
,我认为这会给出结果(如果不丢失任何一个/9它将小于9*ncol),但再次将其视为整数(0),没有任何变化。
4-尝试查找结果在哪里“零”无法使用,因为我在开始中编写的代码给出了丢失的数据“ 9” s(零)的结果相同。我只希望丢失的结果在柜台外。
如果有人可以在此问题上提供帮助,这将不胜感激。由于Stackoverflow希望使评论部分从感谢消息中保持清洁,因此我想提前感谢所有人。
I have been trying to solve this issue but couldn't succeed whatever I tried, the solutions found on internet and this site didn't work either.
I have these kind of datasets with more than 500k rows.
Example subset:
subset= as.data.frame(matrix(c(9,9,9,0,2,9,0,9,9,1,0,2,9,9,9,0,0,0,2,2,2,1,1,1),ncol = 3, byrow = T))
Every column is an individual, every row is a certain marker, with "0,1,2" meaning it is not missing data for that row (of course with other meanings but not necessary here to explain) and "9" meaning it is missing data for that row. I am going to write numbers as with quotation marks to keep it clear to see, but it is numeral in the dataset.
What I am trying to do is counting the rows where at least one of the samples is not missing. So, in the rows where it is all consisted of "9"s, the counter will not increase. If at least one cell is not 9 in that certain row, the counter will increase by one.
After trying for some time, I wrote this code:
counter=0
test = apply(subset, 1, function(i) {
if(length(which(subset[i,] !=9)) != 0){
counter=counter+1
}
print(counter)
assign("counter",counter,envir = .GlobalEnv)
})
When I do this, the counter doesn't increase when the only cell/or cells that are not "9" are integer(0). For example, in the picture I uploaded, the 9th row consists of many "9"s and an integer(0). The counter won't increase in this row but I have to count it, too.
In order to overcome this, I tried different things including;
1- Placing identical(length(which(dummy[i,] ==0)), integer(0))
, all()
functions in various places in the loop, and tried various if else
statements. I also tried various ways that I don't remember all, trying to count integer(0).
2- Changing 9's into NA / changing integer(0)'s into another number such as 3. These both changed the mechanism of the loop, and now regardless of the cells in the row, the counter increases by one.
3- Using the if
conditional with ( condition < 9*ncol(subset) )
, which I thought would give the result (if any of them is not missing/9 it will be less than 9*ncol), but again R sees it as integer(0) and nothing changes.
4- Trying to find where the result is "zero" won't work because the code I wrote in the beginning gives the same result for the missing data "9"s as well (zero). I only want the missing results out of the counter.
If anybody can help regarding this issue, it will be highly appreciated. As stackoverflow wants to keep comment section clean from thank messages, I want to say thanks to everybody in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
据我所知,您想计算至少有一个值不同的行数的行数。有很多方法可以做到这两个替代方法。
使用dplyr,
您可以使用
dplyr
这样做:在2022-05-29上由 reprex软件包(v2.0.1)
in
filter(if_any(avethert(),〜.x!= 9)))
,filter()删除至少一个值不等于9的行。之后,我们只计算行。
使用
apply()
,如果要使用应用程序,则可以执行以下操作:
在2022-05-29上创建的 preprex package (v2.0.1)
在此处,我在每一行
subset
withapply> apply()
和检查该行的任何值是否与9不相等。这返回true
/false
的向量。我们sum()
该向量以找到至少一个值不同的行总数。As I understand, you want to count the number of rows where there is at least one value different to 9. There are many ways of doing this, under are two alternatives.
With dplyr
You can do this with
dplyr
like this:Created on 2022-05-29 by the reprex package (v2.0.1)
In
filter(if_any(everything(), ~ .x != 9))
,filter()
removes the rows where at least one value is not equal to 9. After, we just count the rows.With
apply()
If you want to use apply you can do the following:
Created on 2022-05-29 by the reprex package (v2.0.1)
Here, I iterate over each row of
subset
withapply()
and check whether any values of that row is unequal to 9. This returns a vector ofTRUE
/FALSE
. Wesum()
this vector to find the total number of rows with at least one value different to 9.这是我最容易理解的选择。您可以根据其他变量创建一个带有值的附加列
计数器
。case_when
函数检查列的值,如果找到9,则在Counter
列中放置0。如果在您的任何列中找不到9个,它将返回1。然后,您可以将Counter
列总结以检查没有九个的行的总数。如果您确定理解基本版本,则可以缩短它,因此您不必命名所有列。
That's the option I find the easiest to understand. You can create an additional column
counter
with value based on the other variables. Thecase_when
function checks values of your columns and if it finds a 9, it puts a 0 in thecounter
column. If it doesn't find a 9 in any of your columns, it returns a 1. You can then sum yourcounter
column to check the overall number of rows without nines.If you're sure you understand the basic version, you can shorten it so you don't have to name all of your columns.