如何将R中公共列上的两个数据框与其他数据框的总和合并？

发布于 2024-11-03 05:41:25 字数 655 浏览 1 评论 0原文

Windows 7 上的 R 版本 2.11.1 32 位

我有两个数据集：data_A 和 data_B：

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25
1      16     0.63
1      17     0.26
2      11     0.14
2      14     0.28

data_B

USER_A USER_B ACTION
1      13     0.17
1      14     0.27
2      11     0.25

现在我想将 data_B 的 ACTION 添加到 data_A，如果它们的 USER_A 和 USER_B 相等。如上面的示例，结果将是：

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25+0.17
1      16     0.63
1      17     0.26
2      11     0.14+0.25
2      14     0.28

那么我该如何实现呢？

原文

R Version 2.11.1 32-bit on Windows 7

I got two data sets: data_A and data_B:

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25
1      16     0.63
1      17     0.26
2      11     0.14
2      14     0.28

data_B

USER_A USER_B ACTION
1      13     0.17
1      14     0.27
2      11     0.25

Now I want to add the ACTION of data_B to the data_A if their USER_A and USER_B are equal. As the example above, the result would be:

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25+0.17
1      16     0.63
1      17     0.26
2      11     0.14+0.25
2      14     0.28

So how could I achieve it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸩远一方 2024-11-10 05:41:26

您可以在 plyr 包中使用 ddply 并将其与 merge 结合使用：

library(plyr)
ddply(merge(data_A, data_B, all.x=TRUE), 
  .(USER_A, USER_B), summarise, ACTION=sum(ACTION))

请注意，使用参数 < 调用 merge 。 code>all.x=TRUE - 这将返回传递给 merge 的第一个 data.frame 中的所有值，即 data_A：

  USER_A USER_B ACTION
1      1     11   0.30
2      1     13   0.25
3      1     16   0.63
4      1     17   0.26
5      2     11   0.14
6      2     14   0.28

You can use ddply in package plyr and combine it with merge:

library(plyr)
ddply(merge(data_A, data_B, all.x=TRUE), 
  .(USER_A, USER_B), summarise, ACTION=sum(ACTION))

Notice that merge is called with the parameter all.x=TRUE - this returns all of the values in the first data.frame passed to merge, i.e. data_A:

  USER_A USER_B ACTION
1      1     11   0.30
2      1     13   0.25
3      1     16   0.63
4      1     17   0.26
5      2     11   0.14
6      2     14   0.28

回复收藏 0 原文

孤千羽 2024-11-10 05:41:26

使用类似数据库的操作很容易完成这种事情。在这里，我使用包 sqldf 进行左（外）连接，然后汇总结果对象：

require(sqldf)
tmp <- sqldf("select * from data_A left join data_B using (USER_A, USER_B)")

这导致：

> tmp
  USER_A USER_B ACTION ACTION
1      1     11   0.30     NA
2      1     13   0.25   0.17
3      1     16   0.63     NA
4      1     17   0.26     NA
5      2     11   0.14   0.25
6      2     14   0.28     NA

现在我们只需要对两个 ACTION 列求和：

data_C <- transform(data_A, ACTION = rowSums(tmp[, 3:4], na.rm = TRUE))

这给出期望的结果：

> data_C
  USER_A USER_B ACTION
1      1     11   0.30
2      1     13   0.42
3      1     16   0.63
4      1     17   0.26
5      2     11   0.39
6      2     14   0.28

这可以使用标准 R 函数 merge 来完成：

> merge(data_A, data_B, by = c("USER_A","USER_B"), all.x = TRUE)
  USER_A USER_B ACTION.x ACTION.y
1      1     11     0.30       NA
2      1     13     0.25     0.17
3      1     16     0.63       NA
4      1     17     0.26       NA
5      2     11     0.14     0.25
6      2     14     0.28       NA

因此我们可以将上面的 sqldf() 调用替换为：

tmp <- merge(data_A, data_B, by = c("USER_A","USER_B"), all.x = TRUE)

而第二行使用 transform( ) 保持不变。

This sort of thing is quite easy to do with a database-like operation. Here I use package sqldf to do a left (outer) join and then summarise the resulting object:

require(sqldf)
tmp <- sqldf("select * from data_A left join data_B using (USER_A, USER_B)")

This results in:

> tmp
  USER_A USER_B ACTION ACTION
1      1     11   0.30     NA
2      1     13   0.25   0.17
3      1     16   0.63     NA
4      1     17   0.26     NA
5      2     11   0.14   0.25
6      2     14   0.28     NA

Now we just need sum the two ACTION columns:

data_C <- transform(data_A, ACTION = rowSums(tmp[, 3:4], na.rm = TRUE))

Which gives the desired result:

> data_C
  USER_A USER_B ACTION
1      1     11   0.30
2      1     13   0.42
3      1     16   0.63
4      1     17   0.26
5      2     11   0.39
6      2     14   0.28

This can be done using standard R function merge:

> merge(data_A, data_B, by = c("USER_A","USER_B"), all.x = TRUE)
  USER_A USER_B ACTION.x ACTION.y
1      1     11     0.30       NA
2      1     13     0.25     0.17
3      1     16     0.63       NA
4      1     17     0.26       NA
5      2     11     0.14     0.25
6      2     14     0.28       NA

So we can replace the sqldf() call above with:

tmp <- merge(data_A, data_B, by = c("USER_A","USER_B"), all.x = TRUE)

whilst the second line using transform() remains the same.

回复收藏 0 原文

千里故人稀 2024-11-10 05:41:26

我们可以使用 {powerjoin}：

library(powerjoin)
power_left_join(
  data_A,  data_B, by = c("USER_A", "USER_B"), 
  conflict = ~ .x + ifelse(is.na(.y), 0, .y)
)
#>   USER_A USER_B ACTION
#> 1      1     11   0.30
#> 2      1     13   0.42
#> 3      1     16   0.63
#> 4      1     17   0.26
#> 5      2     11   0.39
#> 6      2     14   0.28

如果发生冲突，将使用提供给 conflict 参数的函数
在成对的冲突列上。

我们还可以按行使用 sum(, na.rm = TRUE) 来达到相同的效果：

power_left_join(data_A,data_B, by = c("USER_A", "USER_B"), 
                conflict = rw ~ sum(.x, .y, na.rm = TRUE))

We can use {powerjoin}:

library(powerjoin)
power_left_join(
  data_A,  data_B, by = c("USER_A", "USER_B"), 
  conflict = ~ .x + ifelse(is.na(.y), 0, .y)
)
#>   USER_A USER_B ACTION
#> 1      1     11   0.30
#> 2      1     13   0.42
#> 3      1     16   0.63
#> 4      1     17   0.26
#> 5      2     11   0.39
#> 6      2     14   0.28

In case of conflict, the function fed to the conflict argument will be used
on pairs of conflicting columns.

We can also use sum(, na.rm = TRUE) row-wise for the same effect :

power_left_join(data_A,data_B, by = c("USER_A", "USER_B"), 
                conflict = rw ~ sum(.x, .y, na.rm = TRUE))

回复收藏 0 原文

~没有更多了~

关于作者

身边

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何将R中公共列上的两个数据框与其他数据框的总和合并？

data_A

data_B

data_A

data_A

data_B

data_A

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如何将R中公共列上的两个数据框与其他数据框的总和合并？

data_A

data_B

data_A

data_A

data_B

data_A

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。