枢轴更长 - 提取列名的一部分

发布于 2025-02-02 15:44:04 字数 1678 浏览 3 评论 0 原文

我想旋转更长的时间。我希望旋转的每个列名都有相同的开始; “ e_mort”,因此可以忽略,但理想情况下,我希望在新列名的开始时保留。每个列名的末尾是可变的,但始终为可变长度字符和/或数字字符串的格式,例如'num'或“ 100k”,或者随后是下划线和另一个字符字符串(无数值)EG' num_lo'或'100k_lo'。我希望提取的信息和枢轴是列名称的中心部分,它始终是'exc_tbhiv'或'tbhiv'。

数据集看起来像这样:

year <- c(2000, 2000, 2001, 2001, 2002, 2002)

location <- rep(c("global","SA"), 3)

e_mort_exc_tbhiv_num <- c(1698959, 26000, 1693829, 27000, 1687579, 29000)

e_mort_exc_tbhiv_num_lo <- c(1600000, 25000, 1600000, 26000, 1600000, 28000)

e_mort_tbhiv_num <- c(674398, 116000, 652384, 144000, 616975, 136000)

e_mort_tbhiv_num_lo <- c(535000, 65000, 517000, 79000, 486000, 75000)

data <- tibble(year,location,
               e_mort_exc_tbhiv_num,
               e_mort_exc_tbhiv_num_lo,
               e_mort_tbhiv_num,
               e_mort_tbhiv_num_lo)

data

所需的输出是:

A tibble: 12 x 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000

谢谢

I want to pivot to longer. Every column name that I wish to pivot has the same beginning; 'e_mort', so could be ignored, but I would ideally like to retain at the start of the new column names. The end of each column name is variable but always of the format of a variable length character and/or numeric string e.g. 'num' or '100k' either on its own or followed by an underscore and another character string (no numerics) e.g. 'num_lo' or '100k_lo'. The information I wish to extract and pivot is the central section of the column name, which will always be 'exc_tbhiv' or 'tbhiv'.

The dataset looks like this:

year <- c(2000, 2000, 2001, 2001, 2002, 2002)

location <- rep(c("global","SA"), 3)

e_mort_exc_tbhiv_num <- c(1698959, 26000, 1693829, 27000, 1687579, 29000)

e_mort_exc_tbhiv_num_lo <- c(1600000, 25000, 1600000, 26000, 1600000, 28000)

e_mort_tbhiv_num <- c(674398, 116000, 652384, 144000, 616975, 136000)

e_mort_tbhiv_num_lo <- c(535000, 65000, 517000, 79000, 486000, 75000)

data <- tibble(year,location,
               e_mort_exc_tbhiv_num,
               e_mort_exc_tbhiv_num_lo,
               e_mort_tbhiv_num,
               e_mort_tbhiv_num_lo)

data

And the desired output is:

A tibble: 12 x 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

初见 2025-02-09 15:44:05

我们可以通过重新安排子字符串重命名列,然后使用 pivot_longer

library(dplyr)
library(stringr)
library(tidyr)
data %>% 
  rename_with(~ 
   str_replace(.x, "^(e_mort)_(.*)_((num|\\d+k)(_.*)?)", "\\1_\\3:\\2"),
      contains("_")) %>%
  pivot_longer(cols = -c(year, location),
    names_to = c(".value", "group"), names_sep = ":")

output

# A tibble: 12 × 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000

We could rename the columns by rearranging the substring and then use pivot_longer

library(dplyr)
library(stringr)
library(tidyr)
data %>% 
  rename_with(~ 
   str_replace(.x, "^(e_mort)_(.*)_((num|\\d+k)(_.*)?)", "\\1_\\3:\\2"),
      contains("_")) %>%
  pivot_longer(cols = -c(year, location),
    names_to = c(".value", "group"), names_sep = ":")

-output

# A tibble: 12 × 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000
鹿! 2025-02-09 15:44:04

单个 pivot_longer 呼叫:

pivot_longer(data, c(-year, -location), 
             names_pattern = "e_mort_(.*)_(num.*)", 
             names_to = c("group", ".value"))
#> # A tibble: 12 x 5
#>     year location group         num  num_lo
#>    <dbl> <chr>    <chr>       <dbl>   <dbl>
#>  1  2000 global   exc_tbhiv 1698959 1600000
#>  2  2000 global   tbhiv      674398  535000
#>  3  2000 SA       exc_tbhiv   26000   25000
#>  4  2000 SA       tbhiv      116000   65000
#>  5  2001 global   exc_tbhiv 1693829 1600000
#>  6  2001 global   tbhiv      652384  517000
#>  7  2001 SA       exc_tbhiv   27000   26000
#>  8  2001 SA       tbhiv      144000   79000
#>  9  2002 global   exc_tbhiv 1687579 1600000
#> 10  2002 global   tbhiv      616975  486000
#> 11  2002 SA       exc_tbhiv   29000   28000
#> 12  2002 SA       tbhiv      136000   75000

在2022-05-29上创建的另一个选项,由(v2.0.1)

Another option with a single pivot_longer call:

pivot_longer(data, c(-year, -location), 
             names_pattern = "e_mort_(.*)_(num.*)", 
             names_to = c("group", ".value"))
#> # A tibble: 12 x 5
#>     year location group         num  num_lo
#>    <dbl> <chr>    <chr>       <dbl>   <dbl>
#>  1  2000 global   exc_tbhiv 1698959 1600000
#>  2  2000 global   tbhiv      674398  535000
#>  3  2000 SA       exc_tbhiv   26000   25000
#>  4  2000 SA       tbhiv      116000   65000
#>  5  2001 global   exc_tbhiv 1693829 1600000
#>  6  2001 global   tbhiv      652384  517000
#>  7  2001 SA       exc_tbhiv   27000   26000
#>  8  2001 SA       tbhiv      144000   79000
#>  9  2002 global   exc_tbhiv 1687579 1600000
#> 10  2002 global   tbhiv      616975  486000
#> 11  2002 SA       exc_tbhiv   29000   28000
#> 12  2002 SA       tbhiv      136000   75000

Created on 2022-05-29 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文