变异，如果：如果列名称出现在第一列中，则替换为具有相同名称的不同数据帧的值

发布于 2025-01-15 06:38:24 字数 748 浏览 1 评论 0原文

我有两个数据帧：一个我想替换其中的值（df_1），另一个我想从中获取替换值（df_2）。请考虑下面的示例数据：

数据

df_1 <- data.frame(
  var=c("xAp", "xBp", "sCp", "sABp", "dBCp", "dCBp"), 
  A=NA, 
  B=NA, 
  C=NA)

df_2 <- data.frame(A=1, B=40, C=25)

所需的操作

如果在 df_1 中，列名称出现在第一列中，那么我想用 df_2 中的值（与该列名称对应的值）替换该列和行中的值。想象一下单元格 df_1[1,2]。列名称为 A。值 A 出现在第一列中（在 df_1[1,1] 中）。这意味着我想用 df_2 中属于 A 的值替换 NA 值，即 1。

如果列名没有出现在第一列中，我希望将其替换为零。

由于我想对每一行执行此操作，因此我一直在考虑将 mutate 与 across 结合起来。然而，当我尝试提取列名称并将它们与第一列中的值进行比较时，我已经陷入困境。

预期输出

data.frame(
  var=c("xAp", "xBp", "sCp", "sABp", "dBCp", "dCBp"), 
  A=c(1, 0, 0, 1, 0, 0), 
  B=c(0, 40, 0, 40, 40, 40), 
  C=c(0, 0, 25, 0, 25, 25))

如果有人可以提供帮助，那就太好了。谢谢！

原文

I have two dataframes: one in which I would like to replace values (df_1), the other one from which I would like to obtain the values for replacement (df_2). Please consider the example data below:

Data

df_1 <- data.frame(
  var=c("xAp", "xBp", "sCp", "sABp", "dBCp", "dCBp"), 
  A=NA, 
  B=NA, 
  C=NA)

df_2 <- data.frame(A=1, B=40, C=25)

Desired action

If in df_1 the column name occurs in the first column, then I want to replace the value in that column and row by a value from df_2, the value that corresponds to this column name. So imagine cell df_1[1,2]. The column name is A. The value A occurs in the first column (in df_1[1,1]). This means I want to replace the NA value with the value that belongs to A in df_2, which is 1.

If the column name does not occur in the first column, I want it replaced by zero.

As I want to perform this action for every row, I have been thinking about a mutate combined with across. I am however stuck already when trying to extract column names and comparing them to values in the first column.

Expected output

data.frame(
  var=c("xAp", "xBp", "sCp", "sABp", "dBCp", "dCBp"), 
  A=c(1, 0, 0, 1, 0, 0), 
  B=c(0, 40, 0, 40, 40, 40), 
  C=c(0, 0, 25, 0, 25, 25))

It would be great if someone can help out. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葵雨 2025-01-22 06:38:24

这里有一个选项 - 循环“df_2”的列名，创建一个条件，判断“var”列子字符串是否存在于 (cur_column()) 中，然后返回相应列的“df_2”值，否则在 case_when 中返回 0

library(dplyr)
library(stringr)
out2 <- df_1 %>%
    mutate(across(all_of(names(df_2)), 
     ~ case_when(str_detect(var, cur_column()) ~ df_2[[cur_column()]], TRUE ~ 0)))

- 检查 OP 的预期

 identical(out, out2)
[1] TRUE

Here is one option - loop across the column names of 'df_2', create a condition whether the 'var' column substring exists in (cur_column()), then return the value of 'df_2' for that corresponding column or else return 0 in case_when

library(dplyr)
library(stringr)
out2 <- df_1 %>%
    mutate(across(all_of(names(df_2)), 
     ~ case_when(str_detect(var, cur_column()) ~ df_2[[cur_column()]], TRUE ~ 0)))

-checking with OP's expected