如何将因子水平转换为 R 中的变量？

发布于 2025-01-17 05:56:38 字数 552 浏览 5 评论 0原文

我对 R 比较陌生，正在尝试构建人口金字塔。我需要在两个变量（popMale、pop Female）中并排显示男性和女性的人口数据。目前性别是一个有 2 个级别的因素。如何将这些 2 因子水平转换为 2 个新变量（popMale、popFemale）。我将不胜感激任何帮助。这是我的数据的 dput 片段：

structure(list(V1 = c("Location", "Dominican Republic", "Dominican Republic", 
"Dominican Republic", "Dominican Republic"), V2 = c("Sex", "Female", 
"Female", "Male", "Male"), V3 = c("Age", "0-4", "5-9", "0-4", 
"5-9"), V4 = c(1950L, 217L, 164L, 223L, 167L), V5 = c(1955L, 
277L, 199L, 286L, 204L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

原文

I am relatively new to R and trying to build a population pyramid. I need to have the population data for Males and Females side-by-side in two variables (popMale, pop female). Currently Sex is a factor with 2 levels. How do I convert these 2-factor levels to 2 new variables(popMale, popFemale). I would appreciate any help. Here is a dput snippet of my data:

structure(list(V1 = c("Location", "Dominican Republic", "Dominican Republic", 
"Dominican Republic", "Dominican Republic"), V2 = c("Sex", "Female", 
"Female", "Male", "Male"), V3 = c("Age", "0-4", "5-9", "0-4", 
"5-9"), V4 = c(1950L, 217L, 164L, 223L, 167L), V5 = c(1955L, 
277L, 199L, 286L, 204L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

妳是的陽光 2025-01-24 05:56:38

由于您的数据在第一行中包含列名称，因此实现所需结果的第一步是根据第一行命名数据，然后删除它。执行此操作后，将数据转换为长格式或整齐格式，即使用 tidyr::pivot_longer 将年份和人口数字移至单独的列中。最后，您可以使用 tidyr::pivot_wider 将男性和女性的数据分布在不同的列中。

注意：根据分析中的后续步骤，最后一步实际上并不需要，并且实际上可能会使绘制人口金字塔变得复杂。

names(df) <- as.character(df[1,])
df <- df[-1,]

library(tidyr) 

df %>% 
  pivot_longer(matches("^\\d+"), names_to = "Year", values_to = "pop") %>% 
  pivot_wider(names_from = Sex, values_from = pop, names_glue = "pop{Sex}")
#> # A tibble: 4 × 5
#>   Location           Age   Year  popFemale popMale
#>   <chr>              <chr> <chr>     <int>   <int>
#> 1 Dominican Republic 0-4   1950        217     223
#> 2 Dominican Republic 0-4   1955        277     286
#> 3 Dominican Republic 5-9   1950        164     167
#> 4 Dominican Republic 5-9   1955        199     204

As your data contains the column names in the first row, the first step to achieve your desired result would be to name your data according to the first row and drop it afterwards. After doing so convert your data to long or tidy format, i.e. move the years and population numbers in separate columns using e.g. tidyr::pivot_longer. Finally, you could use tidyr::pivot_wider to spread the data for males and females in separate columns.

Note: Depending on the next steps in your analysis the last step isn't really needed and may actually complicate plotting a population pyramid.

names(df) <- as.character(df[1,])
df <- df[-1,]

library(tidyr) 

df %>% 
  pivot_longer(matches("^\\d+"), names_to = "Year", values_to = "pop") %>% 
  pivot_wider(names_from = Sex, values_from = pop, names_glue = "pop{Sex}")
#> # A tibble: 4 × 5
#>   Location           Age   Year  popFemale popMale
#>   <chr>              <chr> <chr>     <int>   <int>
#> 1 Dominican Republic 0-4   1950        217     223
#> 2 Dominican Republic 0-4   1955        277     286
#> 3 Dominican Republic 5-9   1950        164     167
#> 4 Dominican Republic 5-9   1955        199     204

回复收藏 0 原文

~没有更多了~