仅适用于选定列的更长
我有一个包含 23 列的宽数据集。我想选择某些列并将它们调整为行(长格式),但仅限这些选定的列。这是我的数据集的示例:
# A tibble: 2 x 23
year popu dd popmale ddmale popfemale ddfemale pop40 dd40 pop41_50 dd41_50 pop51_60 dd51_60 pop61_70 dd61_70 pop71_80 dd71_80 pop81_90
1 2011 197548 2167 98145 1302 99403 1302 56822 52 27614 88 33368 384 30477 683 25418 630 14961
2 2012 200724 2250 99783 1354 100941 896 58646 54 28256 91 34111 400 30919 705 25718 655 14862
# ... with 5 more variables: dd81_90 <dbl>, pop91_100 <dbl>, dd91_100 <dbl>, pop100 <dbl>, dd100 <dbl>
df<-structure(list(year = c(2011, 2012), popu = c(197548, 200724),
dd = c(2167, 2250), popmale = c(98145, 99783), ddmale = c(1302,
1354), popfemale = c(99403, 100941), ddfemale = c(1302, 896
), pop40 = c(56822, 58646), dd40 = c(52, 54), pop41_50 = c(27614,
28256), dd41_50 = c(88, 91), pop51_60 = c(33368, 34111),
dd51_60 = c(384, 400), pop61_70 = c(30477, 30919), dd61_70 = c(683,
705), pop71_80 = c(25418, 25718), dd71_80 = c(630, 655),
pop81_90 = c(14961, 14862), dd81_90 = c(288, 288), pop91_100 = c(7210,
6746), dd91_100 = c(54, 55), pop100 = c(1678, 1466), dd100 = c(1,
2)), row.names = 1:2, class = "data.frame")
在上面的 DF 中,每个年龄类别都有一个不同的人口列(例如 pop41_50
)和事件列(dd41_50
)。
我想创建一个具有更长格式的数据框,它将年龄类别作为值放在一列中,并将人口和事件数量也放在一列中,如下所示:
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
etc.
我已经尝试了以下脚本,但这将所有内容放入一列中,这不是我想要的输出。
pivot_longer(df, -c(year, popu, dd), values_to = "number", names_to = "category")
非常感谢!
I have a wide dataset of 23 columns. I would like to select certain columns and adjust them to rows (long format), but only these selected columns. This is a sample of my dataset:
# A tibble: 2 x 23
year popu dd popmale ddmale popfemale ddfemale pop40 dd40 pop41_50 dd41_50 pop51_60 dd51_60 pop61_70 dd61_70 pop71_80 dd71_80 pop81_90
1 2011 197548 2167 98145 1302 99403 1302 56822 52 27614 88 33368 384 30477 683 25418 630 14961
2 2012 200724 2250 99783 1354 100941 896 58646 54 28256 91 34111 400 30919 705 25718 655 14862
# ... with 5 more variables: dd81_90 <dbl>, pop91_100 <dbl>, dd91_100 <dbl>, pop100 <dbl>, dd100 <dbl>
df<-structure(list(year = c(2011, 2012), popu = c(197548, 200724),
dd = c(2167, 2250), popmale = c(98145, 99783), ddmale = c(1302,
1354), popfemale = c(99403, 100941), ddfemale = c(1302, 896
), pop40 = c(56822, 58646), dd40 = c(52, 54), pop41_50 = c(27614,
28256), dd41_50 = c(88, 91), pop51_60 = c(33368, 34111),
dd51_60 = c(384, 400), pop61_70 = c(30477, 30919), dd61_70 = c(683,
705), pop71_80 = c(25418, 25718), dd71_80 = c(630, 655),
pop81_90 = c(14961, 14862), dd81_90 = c(288, 288), pop91_100 = c(7210,
6746), dd91_100 = c(54, 55), pop100 = c(1678, 1466), dd100 = c(1,
2)), row.names = 1:2, class = "data.frame")
In the DF above, each age category has a different column for the population (pop41_50
for example) and events (dd41_50
).
I would like to create a dataframe with a more long format, which puts the age categories as values in one column and the population and number of events as well, like this:
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
etc.
I've tried the following script but this puts everything into one column, which is not the output I desire.
pivot_longer(df, -c(year, popu, dd), values_to = "number", names_to = "category")
Many thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一种选项是首先重命名列,然后在第二个下划线上拆分。
输出
One option would be to first rename the columns, then split on the second underscore.
Output