R 中频率表的摘要?

发布于 2024-10-16 13:05:29 字数 347 浏览 11 评论 0原文

我有一组用户建议

review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

,想要使用 summary(review) 来显示基本属性平均值、中位数、四分位数和最小值最大值

但它返回了两列的摘要。我避免使用 data.frame 因为因素“Star”是有序的。 我如何告诉 R Star 是一个因素的有序列表数字分数,而投票是它们的频率?

I have a set of user recommandations

review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

and wanted to use summary(review) to show basic properties mean, median, quartiles and min max.

But it gives back the summary of both columns. I refrain from using data.frame because the factors 'Star' are ordered.
How can I tell R that Star is a ordered list of factors numeric score and votes are their frequency?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仙女 2024-10-23 13:05:29

如果 Star 应该是一个有序因子,我不太确定你所说的一般平均值是什么意思。但是,在您给出的示例中,Star 实际上是一组数值,您可以使用以下内容:

library(Hmisc)

R> review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

R> wtd.mean(review[, 1], weights = review[, 2])
[1] 4.0625

R> wtd.quantile(review[, 1], weights = review[, 2])
  0%  25%  50%  75% 100% 
1.00 3.75 5.00 5.00 5.00 

I'm not exactly sure what you mean by taking the mean in general if Star is supposed to be an ordered factor. However, in the example you give where Star is actually a set of numeric values, you can use the following:

library(Hmisc)

R> review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

R> wtd.mean(review[, 1], weights = review[, 2])
[1] 4.0625

R> wtd.quantile(review[, 1], weights = review[, 2])
  0%  25%  50%  75% 100% 
1.00 3.75 5.00 5.00 5.00 
淡紫姑娘! 2024-10-23 13:05:29

我不明白有什么问题。为什么不应该使用data.frame

rv <- data.frame(star = ordered(review[, 1]), votes = review[, 2])

您应该将 data.frame 转换为向量:

( vts <- with(rv, rep(star, votes)) )
 [1] 5 5 5 5 5 5 5 5 5 5 4 4 3 2 1 1
Levels: 1 < 2 < 3 < 4 < 5

然后进行摘要...我只是不知道什么样的摘要,因为 summary 会将您带回到开始。 O_o

summary(vts)
 1  2  3  4  5 
 2  1  1  2 10 

编辑 (根据@Prasad的建议)

由于vts是一个有序因子,您应该将其转换为数字,从而计算摘要(此时此刻我将忽略背景统计问题):

nvts <- as.numeric(levels(vts)[vts])  ## numeric conversion
summary(nvts)  ## "ordinary" summary
fivenum(nvts)  ## Tukey's five number summary

I don't understand what's the problem. Why shouldn't you use data.frame?

rv <- data.frame(star = ordered(review[, 1]), votes = review[, 2])

You should convert your data.frame to vector:

( vts <- with(rv, rep(star, votes)) )
 [1] 5 5 5 5 5 5 5 5 5 5 4 4 3 2 1 1
Levels: 1 < 2 < 3 < 4 < 5

Then do the summary... I just don't know what kind of summary, since summary will bring you back to the start. O_o

summary(vts)
 1  2  3  4  5 
 2  1  1  2 10 

EDIT (on @Prasad's suggestion)

Since vts is an ordered factor, you should convert it to numeric, hence calculate the summary (at this moment I will disregard the background statistical issues):

nvts <- as.numeric(levels(vts)[vts])  ## numeric conversion
summary(nvts)  ## "ordinary" summary
fivenum(nvts)  ## Tukey's five number summary
乖不如嘢 2024-10-23 13:05:29

只是为了澄清——当你说你想要“平均值、中位数、四分位数和最小/最大”时,你是在谈论星星的数量吗?例如平均值 = 4.062 颗星?
然后使用 aL3xa 的代码,像 summary(as.numeric(as.character(vts))) 这样的东西会是你想要的吗?

Just to clarify -- when you say you would like "mean, median, quartiles and min/max", you're talking in terms of number of stars? e.g mean = 4.062 stars?
Then using aL3xa's code, would something like summary(as.numeric(as.character(vts))) be what you want?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文