根据两个标准的总和和计数价值
我无法解决以下问题。
Geslacht persondays age_cat contactfirst
1 V 365 <40 2020
2 V 365 <40 2019
3 V 365 70-80 2019
4 V 365 50-60 2019
5 V 365 60-70 2020
6 M 365 50-60 2020
7 V 365 60-70 2019
8 M 39 60-70 2019
9 V 365 60-70 2019
10 M 365 70-80 2020
df <- structure(list(Geslacht = c("V", "V", "V", "V", "V", "M", "V",
"M", "V", "M", "M", "M", "V", "M", "M", "V", "V", "M", "V", "M",
"V", "V", "M", "V", "M", "M", "M", "M", "M", "V", "M", "V", "M",
"V", "M", "M", "V", "M", "M", "M", "M", "V", "M", "V", "M", "M",
"M", "M", "M", "V"), persondays_individual = c(365, 365, 365,
365, 365, 365, 365, 39, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365), age_cat = structure(c(1L,
1L, 6L, 4L, 5L, 4L, 5L, 5L, 5L, 6L, 1L, 5L, 4L, 6L, 5L, 7L, 6L,
3L, 5L, 5L, 5L, 7L, 5L, 6L, 4L, 4L, 4L, 1L, 6L, 6L, 4L, 7L, 7L,
4L, 3L, 4L, 5L, 5L, 1L, 4L, 6L, 6L, 5L, 5L, 4L, 3L, 7L, 5L, 5L,
4L), .Label = c("<40", ">90", "40-50", "50-60", "60-70", "70-80",
"80-90"), class = "factor"), contactfirst_cat = structure(c(11L,
10L, 10L, 10L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L, 11L, 10L,
11L, 10L, 10L, 10L, 10L, 10L, 11L, 10L, 11L, 11L, 11L, 10L, 11L,
10L, 11L, 11L, 10L, 11L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L,
10L, 11L, 11L, 11L, 10L, 10L, 10L, 10L, 11L, 11L), .Label = c("<2011",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "2020"), class = "factor")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
我想创建两个新变量。一个总结每个年龄_cat和性别的人的天数。第二个在特定年份(在本例2019年)中计算“ ContactFirst”的数量的第二个
所需的输出:
Gender age_cat persondays_total contactfirst_total
V <40 730 1
V 50-60 365 1
V 60-70 1095 2
V 70-80 365 1
M 50-60 365 0
M 60-70 39 1
M 70-80 365 0
我尝试使用group_by
(以下代码)来完成。但这计算了数据框中的所有persondays(因此不是每个性别&age_category),并且不会创建列newContacts。
df2 <- df %>% group_by(Gender, age_cat) %>%
mutate(personyears_total = sum(persondays_individual)) %>%
mutate(newcontacts = nrow(df$contactfirst == "2019"
任何帮助都将不胜感激。
I'm having trouble solving the following problem.
Geslacht persondays age_cat contactfirst
1 V 365 <40 2020
2 V 365 <40 2019
3 V 365 70-80 2019
4 V 365 50-60 2019
5 V 365 60-70 2020
6 M 365 50-60 2020
7 V 365 60-70 2019
8 M 39 60-70 2019
9 V 365 60-70 2019
10 M 365 70-80 2020
df <- structure(list(Geslacht = c("V", "V", "V", "V", "V", "M", "V",
"M", "V", "M", "M", "M", "V", "M", "M", "V", "V", "M", "V", "M",
"V", "V", "M", "V", "M", "M", "M", "M", "M", "V", "M", "V", "M",
"V", "M", "M", "V", "M", "M", "M", "M", "V", "M", "V", "M", "M",
"M", "M", "M", "V"), persondays_individual = c(365, 365, 365,
365, 365, 365, 365, 39, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365), age_cat = structure(c(1L,
1L, 6L, 4L, 5L, 4L, 5L, 5L, 5L, 6L, 1L, 5L, 4L, 6L, 5L, 7L, 6L,
3L, 5L, 5L, 5L, 7L, 5L, 6L, 4L, 4L, 4L, 1L, 6L, 6L, 4L, 7L, 7L,
4L, 3L, 4L, 5L, 5L, 1L, 4L, 6L, 6L, 5L, 5L, 4L, 3L, 7L, 5L, 5L,
4L), .Label = c("<40", ">90", "40-50", "50-60", "60-70", "70-80",
"80-90"), class = "factor"), contactfirst_cat = structure(c(11L,
10L, 10L, 10L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L, 11L, 10L,
11L, 10L, 10L, 10L, 10L, 10L, 11L, 10L, 11L, 11L, 11L, 10L, 11L,
10L, 11L, 11L, 10L, 11L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L,
10L, 11L, 11L, 11L, 10L, 10L, 10L, 10L, 11L, 11L), .Label = c("<2011",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "2020"), class = "factor")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
I would like to create two new variables. One that sums the persons days per age_cat and Gender. And the second that counts the amount of the 'contactfirst' in a certain year (in this case 2019)
Desired output:
Gender age_cat persondays_total contactfirst_total
V <40 730 1
V 50-60 365 1
V 60-70 1095 2
V 70-80 365 1
M 50-60 365 0
M 60-70 39 1
M 70-80 365 0
I've tried to do it with group_by
(following code). But this counts all the persondays in the dataframe (so not per gender & age_category) and does not create the column newcontacts.
df2 <- df %>% group_by(Gender, age_cat) %>%
mutate(personyears_total = sum(persondays_individual)) %>%
mutate(newcontacts = nrow(df$contactfirst == "2019"
Any help would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您不想打扰软件包,请尝试
汇总
。If you don't want to bother with packages, try
aggregate
.您应该使用
总结
而不是突变
,因为您正在为每个组创建一些摘要。另外,使用sum(contactfirst_cat ==“ 2019”)
来计数符合某些条件的记录,而不是nrow()
,该记录计算数据帧中的行数。数据
You should use
summarize
instead ofmutate
, since you are creating some summaries for each group. Also, usesum(contactfirst_cat == "2019")
to count records meeting certain criteria instead ofnrow()
, which counts the number of rows in a dataframe.Data