coce（）dplyr函数中一组变量的非手动论点的常规参数

发布于 2025-02-12 02:34:57 字数 1219 浏览 1 评论 0原文

我有一个将DFS合并为一个的列表。这些DF具有一些匹配的列和行，以及一些独特的或缺少的列。

前两个DF的最小结构（用于理解）。

DF1：

df1 <- structure(list(id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
    Name = c("LI","NO","WH","MA","BU","SO","FO","AT","CO","IN","SP","CE"),
    H_A = c("H", "A", "H", "A", "H", "A", "H", "A", "H", "A", "H", "A"),
    W = c(15, 13, 5, 13, 9, 12, 10, 13, 1, 8, 4, 2),
    X = c(NA, NA, NA, NA, NA, NA, 12, 7, 5, 13, 1, 3),
    Y = c(0, 0, 0, 0, 0,0, NA, NA, NA, NA, NA, NA)),
  row.names = c(NA,-12L),  class = c("tbl_df","tbl", "data.frame"))

DF2：

df2 <- structure(list(id = c(1, 1, 2, 2, 3, 3),
    Name = c("LI","NO", "WH", "MA", "BU", "SO"),
    H_A = c("H", "A", "H", "A", "H", "A"),
    W = c(15, 13, 5, 13, 9, 12),
    X = c(10, 12, 11, 15, 6, 14),
    Z = c(4, 14, 16, 16, 25, 30)),
  row.names = c(NA,-6L),class = c("tbl_df", "tbl", "data.frame"))

可以通过这种替代方案来解决：

df_combined <- full_join(df1, df2, by = c("id", "Name", "H_A")) %>% 
  mutate(X = coalesce(X.x, X.y),
         W = coalesce(W.x, W.y)) %>% 
  select(-contains("."))

我想自动化常规，以非手动输入突变煤层功能中的变量。毕竟，上面的上下文x和w有几个变量。除此之外，我还将继续对DF3，DF4，DF5与DF1相同的匹配的例程。

原文

I have a list of dfs to be combined into one. These dfs have some matching columns and rows and some distinct or missing ones.

The minimum structure (for understanding) of the first two dfs.

df1:

df1 <- structure(list(id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
    Name = c("LI","NO","WH","MA","BU","SO","FO","AT","CO","IN","SP","CE"),
    H_A = c("H", "A", "H", "A", "H", "A", "H", "A", "H", "A", "H", "A"),
    W = c(15, 13, 5, 13, 9, 12, 10, 13, 1, 8, 4, 2),
    X = c(NA, NA, NA, NA, NA, NA, 12, 7, 5, 13, 1, 3),
    Y = c(0, 0, 0, 0, 0,0, NA, NA, NA, NA, NA, NA)),
  row.names = c(NA,-12L),  class = c("tbl_df","tbl", "data.frame"))

df2:

df2 <- structure(list(id = c(1, 1, 2, 2, 3, 3),
    Name = c("LI","NO", "WH", "MA", "BU", "SO"),
    H_A = c("H", "A", "H", "A", "H", "A"),
    W = c(15, 13, 5, 13, 9, 12),
    X = c(10, 12, 11, 15, 6, 14),
    Z = c(4, 14, 16, 16, 25, 30)),
  row.names = c(NA,-6L),class = c("tbl_df", "tbl", "data.frame"))

This can be solved with this alternative:

df_combined <- full_join(df1, df2, by = c("id", "Name", "H_A")) %>% 
  mutate(X = coalesce(X.x, X.y),
         W = coalesce(W.x, W.y)) %>% 
  select(-contains("."))

I would like to automate the routine for non-manual input of the variables in mutate coalesce function. After all, there are several variables for the context X and W above. In addition to this I will continue the routine for df3, df4, df5 that have the same minimal matching with df1.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千寻… 2025-02-19 02:34:57

与他们的本质相连，我们必须实现解决此问题的解决方案，尽管您可以使用如上答案所示的其他语句，但您可以使用coalesce（）是一个非常多的语句。清洁功能要使用。

有关另一个示例，请参见此处的这篇文章（有可能被视为重复的问题）。

使用dplyr填充缺失值（通过连接？）

library(tidyverse)


df_test <- full_join(df1, df2, by = c("id", "Name", "H_A")) %>% 
  mutate(X = coalesce(X.x, X.y),
         W = coalesce(W.x, W.y)) %>% 
  select(id, Name, H_A, W, X, Y, Z)

df_test == df_combined

      id Name  H_A    W    X    Y    Z
 [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [7,] TRUE TRUE TRUE TRUE TRUE   NA   NA
 [8,] TRUE TRUE TRUE TRUE TRUE   NA   NA
 [9,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[10,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[11,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[12,] TRUE TRUE TRUE TRUE TRUE   NA   NA

Na的预期返回Na，因为您无法使用简单的==语句将两个Na匹配在一起。

Joins by their nature don't natively fill in positions we have to implement a fix to solve this problem, and although you can use if else statements as shown in the answer above, coalesce() is a much cleaner function to use.

See this post here for another example (could potentially be seen as a repeated question).

Using dplyr to fill in missing values (through a join?)

library(tidyverse)


df_test <- full_join(df1, df2, by = c("id", "Name", "H_A")) %>% 
  mutate(X = coalesce(X.x, X.y),
         W = coalesce(W.x, W.y)) %>% 
  select(id, Name, H_A, W, X, Y, Z)

df_test == df_combined

      id Name  H_A    W    X    Y    Z
 [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [7,] TRUE TRUE TRUE TRUE TRUE   NA   NA
 [8,] TRUE TRUE TRUE TRUE TRUE   NA   NA
 [9,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[10,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[11,] TRUE TRUE TRUE TRUE TRUE   NA   NA
[12,] TRUE TRUE TRUE TRUE TRUE   NA   NA

NA's expectedly return NA as you can't match two NA's together using a simple == statement.

回复收藏 0 原文

暗地喜欢 2025-02-19 02:34:57

您可以从dplyr中使用left_join，然后替换为na，我猜想id> id h_a 一起做一个钥匙值：

library(dplyr)
df1 <- structure(list(id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
                      Name = c("LI","NO","WH","MA","BU","SO","FO","AT","CO","IN","SP","CE"),
                      H_A = c("H", "A", "H", "A", "H", "A", "H", "A", "H", "A", "H", "A"),
                      W = c(15, 13, 5, 13, 9, 12, 10, 13, 1, 8, 4, 2),
                      X = c(NA, NA, NA, NA, NA, NA, 12, 7, 5, 13, 1, 3),
                      Y = c(0, 0, 0, 0, 0,0, NA, NA, NA, NA, NA, NA)),
                 row.names = c(NA,-12L),  class = c("tbl_df","tbl", "data.frame"))
df2 <- structure(list(id = c(1, 1, 2, 2, 3, 3),
                      Name = c("LI","NO", "WH", "MA", "BU", "SO"),
                      H_A = c("H", "A", "H", "A", "H", "A"),
                      W = c(15, 13, 5, 13, 9, 12),
                      X = c(10, 12, 11, 15, 6, 14),
                      Z = c(4, 14, 16, 16, 25, 30)),
                 row.names = c(NA,-6L),class = c("tbl_df", "tbl", "data.frame"))

df_combined <- left_join(df1, 
                         df2 %>% 
                           select(id, H_A, "df2_X" = X, Z))  %>%
  mutate(X = if_else(is.na(X), df2_X, X)) %>% 
  select(-df2_X)
#> Joining, by = c("id", "H_A")

df_combined
#> # A tibble: 12 × 7
#>       id Name  H_A       W     X     Y     Z
#>    <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#>  1     1 LI    H        15    10     0     4
#>  2     1 NO    A        13    12     0    14
#>  3     2 WH    H         5    11     0    16
#>  4     2 MA    A        13    15     0    16
#>  5     3 BU    H         9     6     0    25
#>  6     3 SO    A        12    14     0    30
#>  7     4 FO    H        10    12    NA    NA
#>  8     4 AT    A        13     7    NA    NA
#>  9     5 CO    H         1     5    NA    NA
#> 10     5 IN    A         8    13    NA    NA
#> 11     6 SP    H         4     1    NA    NA
#> 12     6 CE    A         2     3    NA    NA

You can use left_join from dplyr and substitute NA's like this, where I am guessing Id and H_A together make a key value:

library(dplyr)
df1 <- structure(list(id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
                      Name = c("LI","NO","WH","MA","BU","SO","FO","AT","CO","IN","SP","CE"),
                      H_A = c("H", "A", "H", "A", "H", "A", "H", "A", "H", "A", "H", "A"),
                      W = c(15, 13, 5, 13, 9, 12, 10, 13, 1, 8, 4, 2),
                      X = c(NA, NA, NA, NA, NA, NA, 12, 7, 5, 13, 1, 3),
                      Y = c(0, 0, 0, 0, 0,0, NA, NA, NA, NA, NA, NA)),
                 row.names = c(NA,-12L),  class = c("tbl_df","tbl", "data.frame"))
df2 <- structure(list(id = c(1, 1, 2, 2, 3, 3),
                      Name = c("LI","NO", "WH", "MA", "BU", "SO"),
                      H_A = c("H", "A", "H", "A", "H", "A"),
                      W = c(15, 13, 5, 13, 9, 12),
                      X = c(10, 12, 11, 15, 6, 14),
                      Z = c(4, 14, 16, 16, 25, 30)),
                 row.names = c(NA,-6L),class = c("tbl_df", "tbl", "data.frame"))

df_combined <- left_join(df1, 
                         df2 %>% 
                           select(id, H_A, "df2_X" = X, Z))  %>%
  mutate(X = if_else(is.na(X), df2_X, X)) %>% 
  select(-df2_X)
#> Joining, by = c("id", "H_A")

df_combined
#> # A tibble: 12 × 7
#>       id Name  H_A       W     X     Y     Z
#>    <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#>  1     1 LI    H        15    10     0     4
#>  2     1 NO    A        13    12     0    14
#>  3     2 WH    H         5    11     0    16
#>  4     2 MA    A        13    15     0    16
#>  5     3 BU    H         9     6     0    25
#>  6     3 SO    A        12    14     0    30
#>  7     4 FO    H        10    12    NA    NA
#>  8     4 AT    A        13     7    NA    NA
#>  9     5 CO    H         1     5    NA    NA
#> 10     5 IN    A         8    13    NA    NA
#> 11     6 SP    H         4     1    NA    NA
#> 12     6 CE    A         2     3    NA    NA

回复收藏 0 原文

情话已封尘 2025-02-19 02:34:57

data.table方法

library(data.table)
# set to data.table format
setDT(df1); setDT(df2)
# perform an update join, overwriting NA-values in W, X and Y, and 
# adding Z, based on key-columns ID, Name and H_A
df1[df2, `:=`(W = ifelse(is.na(W), i.W, W), 
              X = ifelse(is.na(X), i.X, X),
              Y = ifelse(is.na(Y), i.Y, Y),
              Z = i.Z),
    on = .(id, Name, H_A)][]

# id Name H_A  W  X  Y  Z
# 1:  1   LI   H 15 10  0  4
# 2:  1   NO   A 13 12  0 14
# 3:  2   WH   H  5 11  0 16
# 4:  2   MA   A 13 15  0 16
# 5:  3   BU   H  9  6  0 25
# 6:  3   SO   A 12 14  0 30
# 7:  4   FO   H 10 12 NA NA
# 8:  4   AT   A 13  7 NA NA
# 9:  5   CO   H  1  5 NA NA
#10:  5   IN   A  8 13 NA NA
#11:  6   SP   H  4  1 NA NA
#12:  6   CE   A  2  3 NA NA

data.table approach

library(data.table)
# set to data.table format
setDT(df1); setDT(df2)
# perform an update join, overwriting NA-values in W, X and Y, and 
# adding Z, based on key-columns ID, Name and H_A
df1[df2, `:=`(W = ifelse(is.na(W), i.W, W), 
              X = ifelse(is.na(X), i.X, X),
              Y = ifelse(is.na(Y), i.Y, Y),
              Z = i.Z),
    on = .(id, Name, H_A)][]

# id Name H_A  W  X  Y  Z
# 1:  1   LI   H 15 10  0  4
# 2:  1   NO   A 13 12  0 14
# 3:  2   WH   H  5 11  0 16
# 4:  2   MA   A 13 15  0 16
# 5:  3   BU   H  9  6  0 25
# 6:  3   SO   A 12 14  0 30
# 7:  4   FO   H 10 12 NA NA
# 8:  4   AT   A 13  7 NA NA
# 9:  5   CO   H  1  5 NA NA
#10:  5   IN   A  8 13 NA NA
#11:  6   SP   H  4  1 NA NA
#12:  6   CE   A  2  3 NA NA

回复收藏 0 原文

~没有更多了~