如何在 R 中计算条件模式？

发布于 2024-12-01 02:16:26 字数 317 浏览 0 评论 0原文

我有一个包含 11 列和 100000 行的大型数据集（例如），其中值是 1,2,3,4。其中 4 是缺失值。我需要的是计算众数。我正在使用以下数据和函数，

ac<-matrix(c("4","4","4","4","4","4","4","3","3","4","4"), nrow=1, ncol=11)  

m<-as.matrix(apply(ac, 1, Mode))

如果我使用上面的命令，那么它会给我“4”作为模式，这是我不需要的。我希望模式将省略 4 并将“3”显示为模式，因为 4 是缺失值。

提前致谢。

原文

I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. What i need is to compute the Mode. I am using following data and function

ac<-matrix(c("4","4","4","4","4","4","4","3","3","4","4"), nrow=1, ncol=11)  

m<-as.matrix(apply(ac, 1, Mode))

if i use the above command then it will give me "4" as the Mode, which i do not need. I want that the Mode will omit 4 and display "3" as Mode, because 4 is a missing value.

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

装纯掩盖桑 2024-12-08 02:16:26

R 拥有强大的处理缺失值的机制。您可以使用 NA 表示缺失值，并且许多 R 函数都支持处理 NA 值。

创建一个包含随机数的小矩阵：

set.seed(123)
m <- matrix(sample(1:4, 12, replace=TRUE), ncol=3)
m
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    4    1    2
[3,]    2    3    4
[4,]    4    4    2

由于您用值 4 表示缺失，因此您可以用 NA 替换每个出现的情况：

m[m==4] <- NA
m

     [,1] [,2] [,3]
[1,]    2   NA    3
[2,]   NA    1    2
[3,]    2    3   NA
[4,]   NA   NA    2

例如，要计算平均值：

mean(m[1, ], na.rm=TRUE)
[1] 2.5

apply(m, 1, mean, na.rm=TRUE)
[1] 2.5 1.5 2.5 2.0

要计算众数，您可以使用包 prettyR 中的函数 Mode：（请注意，在这个非常小的数据集中，只有第 4 行具有唯一的模态值：

apply(m, 1, Mode, na.rm=TRUE)
[1] ">1 mode" ">1 mode" ">1 mode" "2"

R has a powerful mechanism to work with missing values. You can represent a missing value with NA and many of the R functions have support for dealing with NA values.

Create a small matrix with random numbers:

set.seed(123)
m <- matrix(sample(1:4, 12, replace=TRUE), ncol=3)
m
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    4    1    2
[3,]    2    3    4
[4,]    4    4    2

Since you represent missingness by the value 4, you can replace each occurrence by NA:

m[m==4] <- NA
m

     [,1] [,2] [,3]
[1,]    2   NA    3
[2,]   NA    1    2
[3,]    2    3   NA
[4,]   NA   NA    2

To calculate, for example, the mean:

mean(m[1, ], na.rm=TRUE)
[1] 2.5

apply(m, 1, mean, na.rm=TRUE)
[1] 2.5 1.5 2.5 2.0

To calculate the mode, you can use the function Mode in package prettyR: (Note that in this very small set of data, only the 4th row has a unique modal value:

apply(m, 1, Mode, na.rm=TRUE)
[1] ">1 mode" ">1 mode" ">1 mode" "2"

回复收藏 0 原文

天荒地未老 2024-12-08 02:16:26

一种方法（尽管我不太确定其性能）：

tcnt<-table(ac, exclude="4")
actualmode<-names(tcnt)[which.max(tcnt)]

这是用于查找整体模式的代码，但它很容易适应在行内查找。
或者，基于单行者 Thomas Lumley 对 R 邮件列表上的一个老问题的一些回答：

names(sort(-table(ac, exclude="4")))[1]

One way of doing it (though I'm not too sure on its performance):

tcnt<-table(ac, exclude="4")
actualmode<-names(tcnt)[which.max(tcnt)]

This is code for looking for the overall mode, but it's easily adapted to look within rows.
Or, based upon some answer to an old question on the R mailing list by Thomas Lumley, a oneliner: