如何在 R 中计算条件模式?

发布于 2024-12-01 02:16:26 字数 317 浏览 0 评论 0原文

我有一个包含 11 列和 100000 行的大型数据集(例如),其中值是 1,2,3,4。其中 4 是缺失值。我需要的是计算众数。我正在使用以下数据和函数,

ac<-matrix(c("4","4","4","4","4","4","4","3","3","4","4"), nrow=1, ncol=11)  

m<-as.matrix(apply(ac, 1, Mode))

如果我使用上面的命令,那么它会给我“4”作为模式,这是我不需要的。我希望模式将省略 4 并将“3”显示为模式,因为 4 是缺失值。

提前致谢。

I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. What i need is to compute the Mode. I am using following data and function

ac<-matrix(c("4","4","4","4","4","4","4","3","3","4","4"), nrow=1, ncol=11)  

m<-as.matrix(apply(ac, 1, Mode))

if i use the above command then it will give me "4" as the Mode, which i do not need. I want that the Mode will omit 4 and display "3" as Mode, because 4 is a missing value.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

装纯掩盖桑 2024-12-08 02:16:26

R 拥有强大的处理缺失值的机制。您可以使用 NA 表示缺失值,并且许多 R 函数都支持处理 NA 值。

创建一个包含随机数的小矩阵:

set.seed(123)
m <- matrix(sample(1:4, 12, replace=TRUE), ncol=3)
m
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    4    1    2
[3,]    2    3    4
[4,]    4    4    2

由于您用值 4 表示缺失,因此您可以用 NA 替换每个出现的情况:

m[m==4] <- NA
m

     [,1] [,2] [,3]
[1,]    2   NA    3
[2,]   NA    1    2
[3,]    2    3   NA
[4,]   NA   NA    2

例如,要计算平均值:

mean(m[1, ], na.rm=TRUE)
[1] 2.5

apply(m, 1, mean, na.rm=TRUE)
[1] 2.5 1.5 2.5 2.0

要计算众数,您可以使用包 prettyR 中的函数 Mode:(请注意,在这个非常小的数据集中,只有第 4 行具有唯一的模态值:

apply(m, 1, Mode, na.rm=TRUE)
[1] ">1 mode" ">1 mode" ">1 mode" "2"     

R has a powerful mechanism to work with missing values. You can represent a missing value with NA and many of the R functions have support for dealing with NA values.

Create a small matrix with random numbers:

set.seed(123)
m <- matrix(sample(1:4, 12, replace=TRUE), ncol=3)
m
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    4    1    2
[3,]    2    3    4
[4,]    4    4    2

Since you represent missingness by the value 4, you can replace each occurrence by NA:

m[m==4] <- NA
m

     [,1] [,2] [,3]
[1,]    2   NA    3
[2,]   NA    1    2
[3,]    2    3   NA
[4,]   NA   NA    2

To calculate, for example, the mean:

mean(m[1, ], na.rm=TRUE)
[1] 2.5

apply(m, 1, mean, na.rm=TRUE)
[1] 2.5 1.5 2.5 2.0

To calculate the mode, you can use the function Mode in package prettyR: (Note that in this very small set of data, only the 4th row has a unique modal value:

apply(m, 1, Mode, na.rm=TRUE)
[1] ">1 mode" ">1 mode" ">1 mode" "2"     
天荒地未老 2024-12-08 02:16:26

一种方法(尽管我不太确定其性能):

tcnt<-table(ac, exclude="4")
actualmode<-names(tcnt)[which.max(tcnt)]

这是用于查找整体模式的代码,但它很容易适应在行内查找。
或者,基于单行者 Thomas Lumley 对 R 邮件列表上的一个老问题的一些回答:

names(sort(-table(ac, exclude="4")))[1]

One way of doing it (though I'm not too sure on its performance):

tcnt<-table(ac, exclude="4")
actualmode<-names(tcnt)[which.max(tcnt)]

This is code for looking for the overall mode, but it's easily adapted to look within rows.
Or, based upon some answer to an old question on the R mailing list by Thomas Lumley, a oneliner:

names(sort(-table(ac, exclude="4")))[1]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文