总和列基于其他列中的可变名称,该名称包含x,按照相似的字母

发布于 2025-02-13 18:01:28 字数 766 浏览 3 评论 0原文

我有一个类似这样的表:

varrc
dance502
距离204
precmax5
precmin1
tote_prec8
旅行时间5
旅行时间2

我想总结所有类似类型的变量,从而产生这样的东西:

varsum sum
6prec
dist14 prec 14
trav7

使用使用4个字母足以分开不同类型。我已经尝试并尝试过,但没有弄清楚。有人可以协助吗?我通常会尝试与dplyr一起工作,因此这是首选。数据集很小(N< 100),因此不需要速度。

I have a table that is somewhat like this:

varRC
distance502
distance204
precMax5
precMin1
total_prec8
travelTime5
travelTime2

I want to sum all similar type variables, resulting in something like this:

varsum
dist6
prec14
trav7

Using 4 letters is enough to separate the different types. I have tried and tried but not figured it out. Could anyone please assist? I generally try to work with dplyr, so that would be preferred. The datasets are small (n<100) so speed is not required.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

儭儭莪哋寶赑 2025-02-20 18:01:28

基本R解决方案:

aggregate(
  RC ~ var,
  data = transform(
    with(df, df[!(grepl("total", var)),]),
    var = gsub("^(\\w+)([A-Z0-9]\\w+$)", "\\1", var)
  ),
  FUN = sum
)

数据:

df <- structure(list(var = c("distance50", "distance20", "precMax", 
"precMin", "total_prec", "travelTime", "travelTime"), RC = c(2L, 
4L, 5L, 1L, 8L, 5L, 2L)), class = "data.frame", row.names = c(NA, 
-7L))

Base R solution:

aggregate(
  RC ~ var,
  data = transform(
    with(df, df[!(grepl("total", var)),]),
    var = gsub("^(\\w+)([A-Z0-9]\\w+$)", "\\1", var)
  ),
  FUN = sum
)

Data:

df <- structure(list(var = c("distance50", "distance20", "precMax", 
"precMin", "total_prec", "travelTime", "travelTime"), RC = c(2L, 
4L, 5L, 1L, 8L, 5L, 2L)), class = "data.frame", row.names = c(NA, 
-7L))
许久 2025-02-20 18:01:28
library(dplyr)
library(tidyr)

df %>% 
  separate(var, c("var", "b"), sep = "[_A-Z0-9]", extra = "merge") %>% 
  group_by(var = ifelse(b %in% var, b, var)) %>% 
  summarize(RC = sum(RC), .groups = "drop")
  1. 通过在下划线上拆分(_),大写字母 az az 0 -9 。
  2. group_by语句中,如果可以在第一个列中找到第二列,则填写第一列。
  3. 最后,按组总和rc

输出

  var         RC
  <chr>    <int>
1 distance     6
2 prec        14
3 travel       7
library(dplyr)
library(tidyr)

df %>% 
  separate(var, c("var", "b"), sep = "[_A-Z0-9]", extra = "merge") %>% 
  group_by(var = ifelse(b %in% var, b, var)) %>% 
  summarize(RC = sum(RC), .groups = "drop")
  1. separate var into two columns by splitting on underscores (_), capital letters A-Z or numbers 0-9.
  2. In the group_by statement, if the second column can be found in the first then fill the first column.
  3. Lastly, sum RC by group.

Output

  var         RC
  <chr>    <int>
1 distance     6
2 prec        14
3 travel       7
白芷 2025-02-20 18:01:28
tibble(
    var=c('dista', 'distb', 'travelTime'),
    rc=2:4) %>% 
    print() %>% 

# A tibble: 3 x 2
#  var           rc
#  <chr>      <int>
#1 dista          2
#2 distb          3
#3 travelTime     4

    group_by(var=str_sub(var, end=4)) %>% 
    print() %>% 

# A tibble: 3 x 2
# Groups:   var [2]
#  var      rc
#  <chr> <int>
#1 dist      2
#2 dist      3
#3 trav      4

    summarise(sum=sum(rc))

# A tibble: 2 x 2
#  var     sum
#  <chr> <int>
#1 dist      5
#2 trav      4
tibble(
    var=c('dista', 'distb', 'travelTime'),
    rc=2:4) %>% 
    print() %>% 

# A tibble: 3 x 2
#  var           rc
#  <chr>      <int>
#1 dista          2
#2 distb          3
#3 travelTime     4

    group_by(var=str_sub(var, end=4)) %>% 
    print() %>% 

# A tibble: 3 x 2
# Groups:   var [2]
#  var      rc
#  <chr> <int>
#1 dist      2
#2 dist      3
#3 trav      4

    summarise(sum=sum(rc))

# A tibble: 2 x 2
#  var     sum
#  <chr> <int>
#1 dist      5
#2 trav      4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文