根据分数和几个条件将变量分组
我已经尝试了几天,我认为应该很简单,没有运气。希望有人可以帮助我!
我有一个名为“测试”的数据框架,其中具有以下变量:“公司”,“年”,“ firm_size”和“支出”。
我想在一年中将公司分配给大小组,然后在表中显示这些组的平均值,中位数,std.dev和n的支出(例如Stargazer)。因此,第一个规模组(最大的最大公司)应显示每年10%最大的公司的平均值,中值++支出。
规模群体应该是
- 10%的公司,最大的公司
- 最大的公司在25-50
- %之间,最大的公司
- 在50-75%之间,最大的
- 公司在75-90%之间最大的
- 10%最小的公司
这是我尝试的:
test<-arrange(test, -Firm_size)
test$Variable = 0
test[1:min(5715, nrow(test)),]$Variable <- "Expenditures, 0% size <10%"
test[5715:min(14288, nrow(test)),]$Variable <- "Expenditures, 10% size <25%"
test[14288:min(28577, nrow(test)),]$Variable <- "Expenditures, 25% size <50%"
--> And so on
library(dplyr)
testtest = test%>%
group_by(Variable)%>%
dplyr::summarise(
Mean=mean(Expenditures),
Median=median(Expenditures),
Std.dev=sd(Expenditures),
N=n()
)
stargazer(testtest, type = "text", title = "Expenditures firms", digits = 1, summary = FALSE)
如前所述,我不知道如何按百分比使用分数/组。因此,我试图根据安排firm_size下降后根据行分组分组。这样做的问题是,我不花一年的时间来考虑我需要的事情,每年要做这件事(共有20个)。
我的目的是制作一个新变量,该变量为每个尺寸组一个名称。例如,每年最大的10%最大的公司应获得一个名称为“支出,0%尺寸&lt; 10%”的变量,
此外,我在使用Stargazer展示它之前,在我计算不同的措施之前,我将计算出不同的措施。这有效。
!!编辑!! 嗨,再次,
现在在新数据集上运行代码时,我会收到错误“列表对象要键入double”(但它与以前是相同的变量)。
我指的是突变式键盘是您提供的解决方案中的“ rotate(gs = cut ++”之后
。 “ rel =“ nofollow noreferrer”>输入图像描述在这里
the_code
=“ https://i.sstatic.net/su6yn.png” rel =“ nofollow noreferrer”> the_error
I've tried for several days on something I think should be rather simple, with no luck. Hope someone can help me!
I have a data frame called "test" with the following variables: "Firm", "Year", "Firm_size" and "Expenditures".
I want to assign firms to size groups by year and then display the mean, median, std.dev and N of expenditures for these groups in a table (e.g. stargazer). So the first size group (top 10% largest firms) should show the mean, median ++ of expenditures for the 10% largest firms each year.
The size groups should be,
- The 10% largest firms
- The firms that are between 10-25% largest
- The firms that are between 25-50% largest
- The firms that are between 50-75% largest
- The firms that are between 75-90% largest
- The 10% smallest firms
This is what I have tried:
test<-arrange(test, -Firm_size)
test$Variable = 0
test[1:min(5715, nrow(test)),]$Variable <- "Expenditures, 0% size <10%"
test[5715:min(14288, nrow(test)),]$Variable <- "Expenditures, 10% size <25%"
test[14288:min(28577, nrow(test)),]$Variable <- "Expenditures, 25% size <50%"
--> And so on
library(dplyr)
testtest = test%>%
group_by(Variable)%>%
dplyr::summarise(
Mean=mean(Expenditures),
Median=median(Expenditures),
Std.dev=sd(Expenditures),
N=n()
)
stargazer(testtest, type = "text", title = "Expenditures firms", digits = 1, summary = FALSE)
As shown over, I dont know how I could use fractions/group by percentage. I have therefore tried to assign firms in groups based on their rows after having arranged Firm_size to descending. The problem with doing so is that I dont take year in to consideration which I need to, and it is a lot of work to do this for each year (20 in total).
My intention was to make a new variable which gives each size group a name. E.g. top 10% largest firms each year should get a variable with the name "Expenditures, 0% size <10%"
Further I make a new dataframe "testtest" where I calculate the different measures, before using the stargazer to present it. This works.
!!EDIT!!
Hi again,
Now I get the error "List object cannot be coerced to type double" when running the code on a new dataset (but it is the same variables as before).
The mutate-step I'm referring to is the "mutate(gs = cut ++" after "rowwise()" in the solution you provided.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将分位数创建为嵌套变量 (
size_groups
),然后使用cut()
创建组大小 (gs
)。然后按Year
和gs
分组,总结出你想要的指标。输出:
如果您想要包含年平均值的附加列,您可以从
summarize(across())
行中删除.groups="drop"
,然后然后将最后一行添加到管道中:请注意,这是按每个 Group_size 中的公司数量正确加权的,因此返回与使用原始数据
输入数据执行此操作等效的结果:
You can create the quantiles as a nested variable (
size_groups
), and then usecut()
to create the group sizes (gs
). Then group byYear
andgs
to summarize the indicators you want.Output:
If you wanted to have an additional column with the yearly mean, you can remove the
.groups="drop"
from thesummarize(across())
line, and then add this final line to the pipeline:Note that this is correctly weighted by the number of Firms in each Group_size, and thus returns the equivalent of doing this with the original data
Input Data: