R 中的混合合并 - 下标解决方案？

发布于 08-02 18:55 字数 893 浏览 11 评论 0原文

注意： 我更改了第一次发布时的示例。我的第一个示例过于简化，无法捕获真正的问题。

我有两个数据框，它们在一列中以不同的方式排序。我想匹配一列，然后合并第二列中的值。第二列需要保持相同的顺序。

所以我有这个：

state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)

s1
s2

它返回这个：

> s1
  state value1
1    IA      1
2    IA      2
3    IA      3
4    IL      4
5    IL      5
6    IL      6
> s2
  state value2
1    IL      3
2    IL      4
3    IL      5
4    IA      6
5    IA      7
6    IA      8

我想要这个：

  state value1 value2
1    IA      1      6
2    IA      2      7
3    IA      3      8
4    IL      4      3
5    IL      5      4
6    IL      6      5

我要让自己变得愚蠢试图解决这个问题。看起来这应该是一个简单的下标问题。

原文

Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.

I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.

So I have this:

state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)

s1
s2

which returns this:

> s1
  state value1
1    IA      1
2    IA      2
3    IA      3
4    IL      4
5    IL      5
6    IL      6
> s2
  state value2
1    IL      3
2    IL      4
3    IL      5
4    IA      6
5    IA      7
6    IA      8

and I want this:

  state value1 value2
1    IA      1      6
2    IA      2      7
3    IA      3      8
4    IL      4      3
5    IL      5      4
6    IL      6      5

I'm about to drive myself silly trying to solve this. Seems like it should be a simple subscript problem.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月蹉跎了容颜2024-08-09 18:55:38

有几种方法可以做到这一点（毕竟是 R），但我认为最明确的是创建索引。我们需要一个创建顺序索引的函数（从 1 开始，以观察数结束）。

seq_len(3) 
> [1] 1 2 3

但我们需要计算每个分组变量（状态）内的该索引。为此，我们可以使用 R 的 ave 函数。它采用数字作为第一个参数，然后是分组因素，最后是要在每个组中应用的函数。

s1$index <- with(s1,ave(value1,state,FUN=seq_len))
s2$index <- with(s2,ave(value2,state,FUN=seq_len))

（注意使用 with，它告诉 R 在环境/数据帧中搜索变量。这比使用 s1$value1、s2$value2 等更好）

现在我们可以简单地合并（连接）两个数据帧（通过两个数据帧中存在的变量：状态和索引）。

merge(s1,s2)

为了

   state index value1 value2
1    IA     1      1      6
2    IA     2      2      7
3    IA     3      3      8
4    IL     1      4      3
5    IL     2      5      4
6    IL     3      6      5

使其发挥作用，每个数据框中按状态应该有相同数量的观察值。

[编辑：为了清晰起见，对代码进行了评论。]
[编辑：使用 seq_len 而不是按照 hadley 的建议创建新函数。]

There are several ways to do this (it is R, after all) but I think the most clear is creating an index. We need a function that creates a sequential index (starting at one and ending with the number of observations).

seq_len(3) 
> [1] 1 2 3

But we need to calculate this index within each grouping variable (state). For this we can use R's ave function. It takes a numeric as the first argument, then the grouping factors, and finally the function to be applied in each group.

s1$index <- with(s1,ave(value1,state,FUN=seq_len))
s2$index <- with(s2,ave(value2,state,FUN=seq_len))

(Note the use of with, which tells R to search for the variables within the environment/dataframe. This is better practice than using s1$value1, s2$value2, etc.)

Now we can simply merge (join) the two data frames (by the variables present in the both data frames: state and index).

merge(s1,s2)

which gives

   state index value1 value2
1    IA     1      1      6
2    IA     2      2      7
3    IA     3      3      8
4    IL     1      4      3
5    IL     2      5      4
6    IL     3      6      5

For this to work, there should be the same number of observations by state in each of the data frames.

[Edit: commented the code for clarity.]
[Edit: Used seq_len instead of creating a new function as suggested by hadley.]

回复收藏 0 原文

话少情深2024-08-09 18:55:38

注意：检查上面答案的第五条评论。解决方案应该经过

s1$index <- with(s1,ave(value1,state,FUN=seq_along))
s2$index <- with(s2,ave(value2,state,FUN=seq_along))

测试并且有效。

NOTE: Check the 5th comment on the answer above. Solution should be

s1$index <- with(s1,ave(value1,state,FUN=seq_along))
s2$index <- with(s2,ave(value2,state,FUN=seq_along))

Tested and working.

回复收藏 0 原文

~没有更多了~

关于作者

迷鸟归林

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

R 中的混合合并 - 下标解决方案？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_jyh6zNJB

晶哥哥很专祎

聆听风音

星

qq_3LFa8Q

奢华的一滴泪

友情链接

R 中的混合合并 - 下标解决方案？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_jyh6zNJB

晶哥哥很专祎

聆听风音

星

qq_3LFa8Q

奢华的一滴泪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。