R 中的混合合并 - 下标解决方案?
注意: 我更改了第一次发布时的示例。我的第一个示例过于简化,无法捕获真正的问题。
我有两个数据框,它们在一列中以不同的方式排序。我想匹配一列,然后合并第二列中的值。第二列需要保持相同的顺序。
所以我有这个:
state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)
s1
s2
它返回这个:
> s1
state value1
1 IA 1
2 IA 2
3 IA 3
4 IL 4
5 IL 5
6 IL 6
> s2
state value2
1 IL 3
2 IL 4
3 IL 5
4 IA 6
5 IA 7
6 IA 8
我想要这个:
state value1 value2
1 IA 1 6
2 IA 2 7
3 IA 3 8
4 IL 4 3
5 IL 5 4
6 IL 6 5
我要让自己变得愚蠢试图解决这个问题。看起来这应该是一个简单的下标问题。
Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.
I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.
So I have this:
state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)
s1
s2
which returns this:
> s1
state value1
1 IA 1
2 IA 2
3 IA 3
4 IL 4
5 IL 5
6 IL 6
> s2
state value2
1 IL 3
2 IL 4
3 IL 5
4 IA 6
5 IA 7
6 IA 8
and I want this:
state value1 value2
1 IA 1 6
2 IA 2 7
3 IA 3 8
4 IL 4 3
5 IL 5 4
6 IL 6 5
I'm about to drive myself silly trying to solve this. Seems like it should be a simple subscript problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有几种方法可以做到这一点(毕竟是 R),但我认为最明确的是创建索引。我们需要一个创建顺序索引的函数(从 1 开始,以观察数结束)。
但我们需要计算每个分组变量(状态)内的该索引。为此,我们可以使用 R 的
ave
函数。它采用数字作为第一个参数,然后是分组因素,最后是要在每个组中应用的函数。(注意使用
with
,它告诉 R 在环境/数据帧中搜索变量。这比使用 s1$value1、s2$value2 等更好)现在我们可以简单地合并(连接)两个数据帧(通过两个数据帧中存在的变量:状态和索引)。
为了
使其发挥作用,每个数据框中按状态应该有相同数量的观察值。
[编辑:为了清晰起见,对代码进行了评论。]
[编辑:使用 seq_len 而不是按照 hadley 的建议创建新函数。]
There are several ways to do this (it is R, after all) but I think the most clear is creating an index. We need a function that creates a sequential index (starting at one and ending with the number of observations).
But we need to calculate this index within each grouping variable (state). For this we can use R's
ave
function. It takes a numeric as the first argument, then the grouping factors, and finally the function to be applied in each group.(Note the use of
with
, which tells R to search for the variables within the environment/dataframe. This is better practice than using s1$value1, s2$value2, etc.)Now we can simply merge (join) the two data frames (by the variables present in the both data frames: state and index).
which gives
For this to work, there should be the same number of observations by state in each of the data frames.
[Edit: commented the code for clarity.]
[Edit: Used seq_len instead of creating a new function as suggested by hadley.]
注意:检查上面答案的第五条评论。解决方案应该经过
测试并且有效。
NOTE: Check the 5th comment on the answer above. Solution should be
Tested and working.