如何将因子水平转换为 R 中的变量?

发布于 2025-01-17 05:56:38 字数 552 浏览 5 评论 0原文

我对 R 比较陌生,正在尝试构建人口金字塔。我需要在两个变量(popMale、pop Female)中并排显示男性和女性的人口数据。目前性别是一个有 2 个级别的因素。如何将这些 2 因子水平转换为 2 个新变量(popMale、popFemale)。我将不胜感激任何帮助。这是我的数据的 dput 片段:

structure(list(V1 = c("Location", "Dominican Republic", "Dominican Republic", 
"Dominican Republic", "Dominican Republic"), V2 = c("Sex", "Female", 
"Female", "Male", "Male"), V3 = c("Age", "0-4", "5-9", "0-4", 
"5-9"), V4 = c(1950L, 217L, 164L, 223L, 167L), V5 = c(1955L, 
277L, 199L, 286L, 204L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

I am relatively new to R and trying to build a population pyramid. I need to have the population data for Males and Females side-by-side in two variables (popMale, pop female). Currently Sex is a factor with 2 levels. How do I convert these 2-factor levels to 2 new variables(popMale, popFemale). I would appreciate any help. Here is a dput snippet of my data:

structure(list(V1 = c("Location", "Dominican Republic", "Dominican Republic", 
"Dominican Republic", "Dominican Republic"), V2 = c("Sex", "Female", 
"Female", "Male", "Male"), V3 = c("Age", "0-4", "5-9", "0-4", 
"5-9"), V4 = c(1950L, 217L, 164L, 223L, 167L), V5 = c(1955L, 
277L, 199L, 286L, 204L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

妳是的陽光 2025-01-24 05:56:38

由于您的数据在第一行中包含列名称,因此实现所需结果的第一步是根据第一行命名数据,然后删除它。执行此操作后,将数据转换为长格式或整齐格式,即使用 tidyr::pivot_longer 将年份和人口数字移至单独的列中。最后,您可以使用 tidyr::pivot_wider 将男性和女性的数据分布在不同的列中。

注意:根据分析中的后续步骤,最后一步实际上并不需要,并且实际上可能会使绘制人口金字塔变得复杂。

names(df) <- as.character(df[1,])
df <- df[-1,]

library(tidyr) 

df %>% 
  pivot_longer(matches("^\\d+"), names_to = "Year", values_to = "pop") %>% 
  pivot_wider(names_from = Sex, values_from = pop, names_glue = "pop{Sex}")
#> # A tibble: 4 × 5
#>   Location           Age   Year  popFemale popMale
#>   <chr>              <chr> <chr>     <int>   <int>
#> 1 Dominican Republic 0-4   1950        217     223
#> 2 Dominican Republic 0-4   1955        277     286
#> 3 Dominican Republic 5-9   1950        164     167
#> 4 Dominican Republic 5-9   1955        199     204

As your data contains the column names in the first row, the first step to achieve your desired result would be to name your data according to the first row and drop it afterwards. After doing so convert your data to long or tidy format, i.e. move the years and population numbers in separate columns using e.g. tidyr::pivot_longer. Finally, you could use tidyr::pivot_wider to spread the data for males and females in separate columns.

Note: Depending on the next steps in your analysis the last step isn't really needed and may actually complicate plotting a population pyramid.

names(df) <- as.character(df[1,])
df <- df[-1,]

library(tidyr) 

df %>% 
  pivot_longer(matches("^\\d+"), names_to = "Year", values_to = "pop") %>% 
  pivot_wider(names_from = Sex, values_from = pop, names_glue = "pop{Sex}")
#> # A tibble: 4 × 5
#>   Location           Age   Year  popFemale popMale
#>   <chr>              <chr> <chr>     <int>   <int>
#> 1 Dominican Republic 0-4   1950        217     223
#> 2 Dominican Republic 0-4   1955        277     286
#> 3 Dominican Republic 5-9   1950        164     167
#> 4 Dominican Republic 5-9   1955        199     204
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文