按键合并一年与r中的多列合并

发布于 2025-01-23 09:33:31 字数 822 浏览 0 评论 0 原文

我有以下数据集:

df1 <- data.frame(
  "key" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3), 
  "year" = c(2002, 2002, 2004, 2004, 2002, 2002, 2004, 2004, 2004, 2004),
  "Var1" = c(10, NA, 5, 5, 4, NA, NA, 3, 2, 2),
  "Var2" = c(1, 1, 3, 3, 2, NA, 3, NA, 1, NA),
  "Var3" = c(NA, 2, NA, NA, 5, 5, 3, NA, 2, NA),
  "Var4" = c(NA, 4, 5, 5, 6, NA, 4, NA, NA, NA))

我现在想按键和年将重复行合并,以使一个看起来如下的数据集:

df2 <- data.frame(
  "key" = c(1, 1, 2, 2, 3), 
  "year" = c(2002, 2004, 2002, 2004, 2004),
  "Var1" = c(10, 5, 4, 3, 2),
  "Var2" = c(1, 3, 2, 3, 1),
  "Var3" = c(2, NA, 5, 3, 2),
  "Var4" = c(4, 5, 6, 4, NA))

问题是我有30多列,数百至数千行。因此,该解决方案似乎有点不合时宜:。 感谢任何帮助!

I have the following dataset:

df1 <- data.frame(
  "key" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3), 
  "year" = c(2002, 2002, 2004, 2004, 2002, 2002, 2004, 2004, 2004, 2004),
  "Var1" = c(10, NA, 5, 5, 4, NA, NA, 3, 2, 2),
  "Var2" = c(1, 1, 3, 3, 2, NA, 3, NA, 1, NA),
  "Var3" = c(NA, 2, NA, NA, 5, 5, 3, NA, 2, NA),
  "Var4" = c(NA, 4, 5, 5, 6, NA, 4, NA, NA, NA))

I now want to merge the duplicate rows by key and year to have a dataset that looks like follows:

df2 <- data.frame(
  "key" = c(1, 1, 2, 2, 3), 
  "year" = c(2002, 2004, 2002, 2004, 2004),
  "Var1" = c(10, 5, 4, 3, 2),
  "Var2" = c(1, 3, 2, 3, 1),
  "Var3" = c(2, NA, 5, 3, 2),
  "Var4" = c(4, 5, 6, 4, NA))

The problem is that I have over 30 columns and hundreds to thousands of rows. Thus, this solution seems a little bit unhandy: Merge rows within a dataframe by a key.
I would appreciate any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

两人的回忆 2025-01-30 09:33:31

您可以 group_by(键,年),并获取每列的最大值,不包括NAS和只有NAS的组:

library(dplyr)
df1 %>% 
  group_by(key, year) %>% 
  summarise(across(everything(), ~ ifelse(all(is.na(.x)), NA, max(.x, na.rm = T))))

## A tibble: 5 x 6
## Groups:   key [3]
#    key  year  Var1  Var2  Var3  Var4
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1  2002    10     1     2     4
#2     1  2004     5     3    NA     5
#3     2  2002     4     2     5     6
#4     2  2004     3     3     3     4
#5     3  2004     2     1     2    NA

You can group_by(key, year) and get the maximum value for each column, excluding NAs and groups with only NAs:

library(dplyr)
df1 %>% 
  group_by(key, year) %>% 
  summarise(across(everything(), ~ ifelse(all(is.na(.x)), NA, max(.x, na.rm = T))))

## A tibble: 5 x 6
## Groups:   key [3]
#    key  year  Var1  Var2  Var3  Var4
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1  2002    10     1     2     4
#2     1  2004     5     3    NA     5
#3     2  2002     4     2     5     6
#4     2  2004     3     3     3     4
#5     3  2004     2     1     2    NA
黑色毁心梦 2025-01-30 09:33:31

您可以使用 fill()填充每个组的缺失值,并使用 dimption()找到唯一的行。

library(tidyverse)

df1 %>%
  group_by(key, year) %>%
  fill(Var1:Var4, .direction = "downup") %>%
  distinct() %>%
  ungroup()

# A tibble: 5 × 6
    key  year  Var1  Var2  Var3  Var4
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  2002    10     1     2     4
2     1  2004     5     3    NA     5
3     2  2002     4     2     5     6
4     2  2004     3     3     3     4
5     3  2004     2     1     2    NA

You can fill in missing values by each group with fill() and find unique rows with distinct().

library(tidyverse)

df1 %>%
  group_by(key, year) %>%
  fill(Var1:Var4, .direction = "downup") %>%
  distinct() %>%
  ungroup()

# A tibble: 5 × 6
    key  year  Var1  Var2  Var3  Var4
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  2002    10     1     2     4
2     1  2004     5     3    NA     5
3     2  2002     4     2     5     6
4     2  2004     3     3     3     4
5     3  2004     2     1     2    NA
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文