跨行相乘和相加

发布于 2025-01-12 22:54:57 字数 1304 浏览 0 评论 0原文

我有这个数据框：

color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100

sample_data = data.frame(id, color_1)


 id color_1
1  1     KAK
2  2     AKZ
3  3     KAK
4  4     KAK
5  5     AKZ
6  6     ZZA

假设有一个图例：

K = 3
A = 4
Z = 6

我想向上面的数据框添加两列：

sample_data$add_score ：例如 KAK = K + A + K = 3 + 4 + 3 = 10
sample_data$multiply_score ：例如 KAK = K * A * K = 3 * 4 * 3 = 36

我想这样解决问题：

sample_data$first = substr(color_1,1,1)
sample_data$second = substr(color_1,2,2)
sample_data$third = substr(color_1,3,3)

sample_data$first_score = ifelse(sample_data$first == "K", 3, ifelse(sample_data$first == "A", 4, 6))
 
sample_data$second_score = ifelse(sample_data$second == "K", 3, ifelse(sample_data$second == "A", 4, 6))

sample_data$third_score = ifelse(sample_data$third == "K", 3, ifelse(sample_data$third == "A", 4, 6))

sample_data$add_score = sample_data$first_score + sample_data$second_score + sample_data$third_score

sample_data$multiply_score = sample_data$first_score * sample_data$second_score * sample_data$third_score

但我认为这种方式会如果“color_1”的长度较长，则需要很长时间。考虑到得分传奇，是否有更快的方法来做到这一点？

谢谢你！

原文

I have this data frame:

color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100

sample_data = data.frame(id, color_1)


 id color_1
1  1     KAK
2  2     AKZ
3  3     KAK
4  4     KAK
5  5     AKZ
6  6     ZZA

Suppose there is a legend:

K = 3
A = 4
Z = 6

I want to add two columns to the above data frame:

sample_data$add_score : e.g. KAK = K + A + K = 3 + 4 + 3 = 10
sample_data$multiply_score : e.g. KAK = K * A * K = 3 * 4 * 3 = 36

I thought of solving the problem like this:

sample_data$first = substr(color_1,1,1)
sample_data$second = substr(color_1,2,2)
sample_data$third = substr(color_1,3,3)

sample_data$first_score = ifelse(sample_data$first == "K", 3, ifelse(sample_data$first == "A", 4, 6))
 
sample_data$second_score = ifelse(sample_data$second == "K", 3, ifelse(sample_data$second == "A", 4, 6))

sample_data$third_score = ifelse(sample_data$third == "K", 3, ifelse(sample_data$third == "A", 4, 6))

sample_data$add_score = sample_data$first_score + sample_data$second_score + sample_data$third_score

sample_data$multiply_score = sample_data$first_score * sample_data$second_score * sample_data$third_score

But I think this way would take a long time if the length of "color_1" was longer. Given a scoring legend, is there a faster way to do this?

Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘虞初梦 2025-01-19 22:54:57

这是一个方法。
主要技巧是将strsplit 分解为单个字符并将这些向量与图例相匹配。然后将匹配的数字相加或相乘。

set.seed(2022)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id <- 1:100

sample_data = data.frame(id, color_1)

legend <- setNames(c(3, 4, 6), c("K", "A", "Z"))

add_mul <- function(x, l){
  add <- function(y, l){
    i <- match(y, names(l))
    sum(l[i])
  }
  mul <- function(y, l){
    i <- match(y, names(l))
    prod(l[i])
  }
  
  s <- strsplit(x, "")
  add_score <- sapply(s, add, l = l)
  mul_score <- sapply(s, mul, l = l)
  data.frame(add_score, mul_score)
}

sample_data <- cbind(sample_data, add_mul(sample_data$color_1, legend))
head(sample_data)
#>   id color_1 add_score mul_score
#> 1  1     ZZA        16       144
#> 2  2     KAK        10        36
#> 3  3     AKZ        13        72
#> 4  4     KAK        10        36
#> 5  5     AKZ        13        72
#> 6  6     KAK        10        36

^{由 reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 10 日}

Here is a way.
The main trick is to strsplit into single characters and match these vectors with the legend. Then add or multiply the matching numbers.

set.seed(2022)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id <- 1:100

sample_data = data.frame(id, color_1)

legend <- setNames(c(3, 4, 6), c("K", "A", "Z"))

add_mul <- function(x, l){
  add <- function(y, l){
    i <- match(y, names(l))
    sum(l[i])
  }
  mul <- function(y, l){
    i <- match(y, names(l))
    prod(l[i])
  }
  
  s <- strsplit(x, "")
  add_score <- sapply(s, add, l = l)
  mul_score <- sapply(s, mul, l = l)
  data.frame(add_score, mul_score)
}

sample_data <- cbind(sample_data, add_mul(sample_data$color_1, legend))
head(sample_data)
#>   id color_1 add_score mul_score
#> 1  1     ZZA        16       144
#> 2  2     KAK        10        36
#> 3  3     AKZ        13        72
#> 4  4     KAK        10        36
#> 5  5     AKZ        13        72
#> 6  6     KAK        10        36

^{Created on 2022-03-10 by the reprex package (v2.0.1)}

回复收藏 0 原文

仅冇旳回忆 2025-01-19 22:54:57

这是使用 tidyverse 的另一个选项。我使用 dplyr 中的 recode 根据 legend 将字母更改为数字。

library(tidyverse)

legend <- c(K = 3, A = 4, Z = 6)

sample_data %>%
  rowwise %>%
  mutate(code = list(recode(str_split(color_1, "", simplify = T), !!!legend)),
         add_score = sum(code),
         multiply_score = prod(code)) %>%
  select(-code)

输出

      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 AKZ            13             72
 2     2 AKZ            13             72
 3     3 KAK            10             36
 4     4 KAK            10             36
 5     5 AKZ            13             72
 6     6 AKZ            13             72
 7     7 KAK            10             36
 8     8 AKZ            13             72
 9     9 AKZ            13             72
10    10 AKZ            13             72
# … with 90 more rows

数据

set.seed(103)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100

sample_data = data.frame(id, color_1)

基准

看起来 Rui Barradas 的解决方案是迄今为止最快的答案。

Here is another option using tidyverse. I use recode from dplyr to change the letters to numbers according to the legend.

library(tidyverse)

legend <- c(K = 3, A = 4, Z = 6)

sample_data %>%
  rowwise %>%
  mutate(code = list(recode(str_split(color_1, "", simplify = T), !!!legend)),
         add_score = sum(code),
         multiply_score = prod(code)) %>%
  select(-code)

Output

      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 AKZ            13             72
 2     2 AKZ            13             72
 3     3 KAK            10             36
 4     4 KAK            10             36
 5     5 AKZ            13             72
 6     6 AKZ            13             72
 7     7 KAK            10             36
 8     8 AKZ            13             72
 9     9 AKZ            13             72
10    10 AKZ            13             72
# … with 90 more rows

Data

set.seed(103)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100

sample_data = data.frame(id, color_1)

Benchmark

It looks like Rui Barradas' solution is the fastest of the answers so far.

回复收藏 0 原文

谁的年少不轻狂 2025-01-19 22:54:57

我们可以使用stri_replace_all_regex与算术运算符一起将您的color_1替换为整数。

在这里，我将您的值存储到向量 color_1_convert 中。我们可以将其用作 stri_replace_all_regex 中的输入，以便更好地管理值。

library(dplyr)
library(stringi)

color_1_convert <- c("K" = "3", "A" = "4", "Z" = "6")

sample_data %>%
  group_by(id) %>%
  mutate(add_score = eval(parse(text = gsub("\\+$", "", stri_replace_all_regex(color_1, names(color_1_convert), paste0(color_1_convert, "+"), vectorize_all = F)))),
         multiply_score = eval(parse(text = gsub("\\*$", "", stri_replace_all_regex(color_1, names(color_1_convert), paste0(color_1_convert, "*"), vectorize_all = F)))))

# A tibble: 100 × 4
# Groups:   id [100]
      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 KAK            10             36
 2     2 ZZA            16            144
 3     3 AKZ            13             72
 4     4 ZZA            16            144
 5     5 AKZ            13             72
 6     6 AKZ            13             72
 7     7 AKZ            13             72
 8     8 KAK            10             36
 9     9 ZZA            16            144
10    10 AKZ            13             72
# … with 90 more rows

We can use stri_replace_all_regex to replace your color_1 into integers together with the arithmetic operator.

Here I've stored your values into a vector color_1_convert. We can use this as the input in stri_replace_all_regex for better management of the values.

library(dplyr)
library(stringi)

color_1_convert <- c("K" = "3", "A" = "4", "Z" = "6")

sample_data %>%
  group_by(id) %>%
  mutate(add_score = eval(parse(text = gsub("\\+quot;, "", stri_replace_all_regex(color_1, names(color_1_convert), paste0(color_1_convert, "+"), vectorize_all = F)))),
         multiply_score = eval(parse(text = gsub("\\*quot;, "", stri_replace_all_regex(color_1, names(color_1_convert), paste0(color_1_convert, "*"), vectorize_all = F)))))

# A tibble: 100 × 4
# Groups:   id [100]
      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 KAK            10             36
 2     2 ZZA            16            144
 3     3 AKZ            13             72
 4     4 ZZA            16            144
 5     5 AKZ            13             72
 6     6 AKZ            13             72
 7     7 AKZ            13             72
 8     8 KAK            10             36
 9     9 ZZA            16            144
10    10 AKZ            13             72
# … with 90 more rows

回复收藏 0 原文

Hello爱情风 2025-01-19 22:54:57

这是一种基于 tidyr::separate_rows() 和分组 dplyr::summarize() 的方法：

library(tidyverse)
set.seed(1)

legend <- c(K = 3, A = 4, Z = 6)

sample_data %>%
  mutate(decoded = color_1) %>%
  separate_rows(decoded, sep = "(?!^)") %>%
  mutate(decoded = legend[decoded]) %>%
  group_by(id, color_1) %>%
  summarize(
    add_score = sum(decoded),
    multiply_score = prod(decoded), 
    .groups = "drop"
  )

输出：

# A tibble: 100 x 4
      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 AKZ            13             72
 2     2 AKZ            13             72
 3     3 KAK            10             36
 4     4 ZZA            16            144
 5     5 AKZ            13             72
 6     6 ZZA            16            144
 7     7 ZZA            16            144
 8     8 KAK            10             36
 9     9 KAK            10             36
10    10 AKZ            13             72
# ... with 90 more rows

Here’s an approach based on tidyr::separate_rows() followed by a grouped dplyr::summarize():

library(tidyverse)
set.seed(1)

legend <- c(K = 3, A = 4, Z = 6)

sample_data %>%
  mutate(decoded = color_1) %>%
  separate_rows(decoded, sep = "(?!^)") %>%
  mutate(decoded = legend[decoded]) %>%
  group_by(id, color_1) %>%
  summarize(
    add_score = sum(decoded),
    multiply_score = prod(decoded), 
    .groups = "drop"
  )

Output:

# A tibble: 100 x 4
      id color_1 add_score multiply_score
   <int> <chr>       <dbl>          <dbl>
 1     1 AKZ            13             72
 2     2 AKZ            13             72
 3     3 KAK            10             36
 4     4 ZZA            16            144
 5     5 AKZ            13             72
 6     6 ZZA            16            144
 7     7 ZZA            16            144
 8     8 KAK            10             36
 9     9 KAK            10             36
10    10 AKZ            13             72
# ... with 90 more rows

回复收藏 0 原文

囍孤女 2025-01-19 22:54:57

还有另一种选择，使用一些针对速度进行优化的库，stringi 用于字符串操作，Rfast 用于矩阵运算。请注意，当数据中存在任何 NA 值时，matrixStats 使用起来比 Rfast 更安全。

set.seed(2022)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100
sample_data = data.frame(id, color_1)

m <- strsplit(sample_data[["color_1"]], "") |>
  unlist(use.names = F) |>
  stringi::stri_replace_all_regex(
    c("K", "A", "Z"),
    c("3", "4", "6"), vectorize_all = F) |>
  as.integer() |>
  matrix(ncol = 3, byrow = T)
sample_data$add_score <- Rfast::rowsums(m)
sample_data$mul_score <- Rfast::rowprods(m)

head(sample_data)
  id color_1 add_score mul_score
1  1     ZZA        16       144
2  2     KAK        10        36
3  3     AKZ        13        72
4  4     KAK        10        36
5  5     AKZ        13        72
6  6     KAK        10        36

Yet another alternative, using some libraries optimized for speed, stringi for string manipulation and Rfast for matrix operations. Note that when any NA values are present in your data matrixStats is safer to use than Rfast.

set.seed(2022)
color <- c("AKZ", "ZZA", "KAK")    
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100
sample_data = data.frame(id, color_1)

m <- strsplit(sample_data[["color_1"]], "") |>
  unlist(use.names = F) |>
  stringi::stri_replace_all_regex(
    c("K", "A", "Z"),
    c("3", "4", "6"), vectorize_all = F) |>
  as.integer() |>
  matrix(ncol = 3, byrow = T)
sample_data$add_score <- Rfast::rowsums(m)
sample_data$mul_score <- Rfast::rowprods(m)

head(sample_data)
  id color_1 add_score mul_score
1  1     ZZA        16       144
2  2     KAK        10        36
3  3     AKZ        13        72
4  4     KAK        10        36
5  5     AKZ        13        72
6  6     KAK        10        36

回复收藏 0 原文

~没有更多了~