使用 R 基于变量创建聚合列

发布于 2024-12-25 02:19:10 字数 568 浏览 1 评论 0原文

如果这有点菜鸟问题,我提前道歉,但我查看了论坛 并且无法找到一种方法来搜索我正在尝试做的事情。 我有一个训练集,我正在尝试找到一种方法来减少分类变量的级别数量 (在下面的示例中,类别是状态)。我想将状态映射到级别的平均值或比率。 一旦输入到数据框中,我的训练集将如下所示:

    state class mean
1      CA     1    0
2      AZ     1    0
3      NY     0    0
4      CA     0    0
5      NY     0    0
6      AZ     0    0
7      AZ     1    0
8      AZ     0    0
9      CA     0    0
10     VA     1    0

我希望数据框中的第三列是基于类变量的第一列(状态)的平均值。所以 CA 行的平均值将为 0.333 ... 这样平均列就可以用来代替状态列 有没有什么好的方法可以做到这一点,而无需在 R 中编写显式循环?

如果我的训练集不包含新级别(例如新状态),如何映射它们?任何与 R 中方法的链接将不胜感激。

I apologize in advanced if this is somewhat of a noob question but I looked in the forum
and couldn't find a way to search what I am trying to do.
I have a training set and I am trying to find a way to reduce the number of levels I have for my categorical variables
(In the example below the category is the state). I would like to map the state to the mean or rate of the level.
My training set would look like the following once input into a data frame:

    state class mean
1      CA     1    0
2      AZ     1    0
3      NY     0    0
4      CA     0    0
5      NY     0    0
6      AZ     0    0
7      AZ     1    0
8      AZ     0    0
9      CA     0    0
10     VA     1    0

I would like the third column in my data frame to be the mean of the first column(state) based on the class variable. so the mean for CA rows will be 0.333 ...
so that the mean column could be used as a replacement for the state column
Is there some good way of doing this without writing an explicit loop in R?

How does one go about mapping new levels (example new states) if my training set didn't include them? Any link to approaches in R would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷心人i 2025-01-01 02:19:10

这确实是 ave 函数的设计目的。它确实可以用来按类别构造任何函数结果,但它的默认函数是mean,因此得名,即ave-(rage):

dfrm$mean <- with( dfrm, ave( class, state ) ) #FUN=mean is the default "setting"

This is really what the ave function was designed for. It can really be used to construct any functional result by category, but its default funciton is mean hence the name, ie, ave-(rage):

dfrm$mean <- with( dfrm, ave( class, state ) ) #FUN=mean is the default "setting"
∞琼窗梦回ˉ 2025-01-01 02:19:10
    library(plyr)
    join(data,ddply(data,.(state),summarise,mean=mean(class)),by=("state"),type="left")
    library(plyr)
    join(data,ddply(data,.(state),summarise,mean=mean(class)),by=("state"),type="left")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文