跨行相乘和相加
我有这个数据框:
color <- c("AKZ", "ZZA", "KAK")
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100
sample_data = data.frame(id, color_1)
id color_1
1 1 KAK
2 2 AKZ
3 3 KAK
4 4 KAK
5 5 AKZ
6 6 ZZA
假设有一个图例:
- K = 3
- A = 4
- Z = 6
我想向上面的数据框添加两列:
- sample_data$add_score :例如 KAK = K + A + K = 3 + 4 + 3 = 10
- sample_data$multiply_score :例如 KAK = K * A * K = 3 * 4 * 3 = 36
我想这样解决问题:
sample_data$first = substr(color_1,1,1)
sample_data$second = substr(color_1,2,2)
sample_data$third = substr(color_1,3,3)
sample_data$first_score = ifelse(sample_data$first == "K", 3, ifelse(sample_data$first == "A", 4, 6))
sample_data$second_score = ifelse(sample_data$second == "K", 3, ifelse(sample_data$second == "A", 4, 6))
sample_data$third_score = ifelse(sample_data$third == "K", 3, ifelse(sample_data$third == "A", 4, 6))
sample_data$add_score = sample_data$first_score + sample_data$second_score + sample_data$third_score
sample_data$multiply_score = sample_data$first_score * sample_data$second_score * sample_data$third_score
但我认为这种方式会如果“color_1”的长度较长,则需要很长时间。考虑到得分传奇,是否有更快的方法来做到这一点?
谢谢你!
I have this data frame:
color <- c("AKZ", "ZZA", "KAK")
color_1 <- sample(color, 100, replace=TRUE, prob=c(0.4, 0.3, 0.3))
id = 1:100
sample_data = data.frame(id, color_1)
id color_1
1 1 KAK
2 2 AKZ
3 3 KAK
4 4 KAK
5 5 AKZ
6 6 ZZA
Suppose there is a legend:
- K = 3
- A = 4
- Z = 6
I want to add two columns to the above data frame:
- sample_data$add_score : e.g. KAK = K + A + K = 3 + 4 + 3 = 10
- sample_data$multiply_score : e.g. KAK = K * A * K = 3 * 4 * 3 = 36
I thought of solving the problem like this:
sample_data$first = substr(color_1,1,1)
sample_data$second = substr(color_1,2,2)
sample_data$third = substr(color_1,3,3)
sample_data$first_score = ifelse(sample_data$first == "K", 3, ifelse(sample_data$first == "A", 4, 6))
sample_data$second_score = ifelse(sample_data$second == "K", 3, ifelse(sample_data$second == "A", 4, 6))
sample_data$third_score = ifelse(sample_data$third == "K", 3, ifelse(sample_data$third == "A", 4, 6))
sample_data$add_score = sample_data$first_score + sample_data$second_score + sample_data$third_score
sample_data$multiply_score = sample_data$first_score * sample_data$second_score * sample_data$third_score
But I think this way would take a long time if the length of "color_1" was longer. Given a scoring legend, is there a faster way to do this?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是一个方法。
主要技巧是将
strsplit
分解为单个字符并将这些向量与图例相匹配。然后将匹配的数字相加或相乘。由 reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 10 日
Here is a way.
The main trick is to
strsplit
into single characters and match these vectors with the legend. Then add or multiply the matching numbers.Created on 2022-03-10 by the reprex package (v2.0.1)
这是使用
tidyverse
的另一个选项。我使用dplyr
中的recode
根据legend
将字母更改为数字。输出
数据
基准
看起来 Rui Barradas 的解决方案是迄今为止最快的答案。
Here is another option using
tidyverse
. I userecode
fromdplyr
to change the letters to numbers according to thelegend
.Output
Data
Benchmark
It looks like Rui Barradas' solution is the fastest of the answers so far.
我们可以使用
stri_replace_all_regex
与算术运算符一起将您的color_1
替换为整数。在这里,我将您的值存储到向量
color_1_convert
中。我们可以将其用作stri_replace_all_regex
中的输入,以便更好地管理值。We can use
stri_replace_all_regex
to replace yourcolor_1
into integers together with the arithmetic operator.Here I've stored your values into a vector
color_1_convert
. We can use this as the input instri_replace_all_regex
for better management of the values.这是一种基于
tidyr::separate_rows()
和分组dplyr::summarize()
的方法:输出:
Here’s an approach based on
tidyr::separate_rows()
followed by a groupeddplyr::summarize()
:Output:
还有另一种选择,使用一些针对速度进行优化的库,
stringi
用于字符串操作,Rfast
用于矩阵运算。请注意,当数据中存在任何NA
值时,matrixStats
使用起来比 Rfast 更安全。Yet another alternative, using some libraries optimized for speed,
stringi
for string manipulation andRfast
for matrix operations. Note that when anyNA
values are present in your datamatrixStats
is safer to use than Rfast.