R 中的简单数据操作
@Aniko 指出,查看我的问题的一种方法是,我需要找到图的连接组件,其中顶点称为组,变量 group
和 nomminated_group
指示这两个组之间的边缘。我的目标是创建一个变量 parent_Group
来索引连接的组件。或者正如我之前所说:
我有一个包含四个变量的数据框:ID
、group
、nomminated_ID
和 nomminated_Group
代码>.
考虑姐妹组:如果数据中至少存在一种情况 group==A 且nominal_group==B,则组 A 和 B 是姐妹组,反之亦然。
我想创建一个变量 parent_group
,它为每组姐妹组采用唯一的值。换句话说,不同 parent_group
中的案例之间不应出现提名。使 parent_group
顺序编号似乎是个好主意。
非常感谢我已经收到的帮助 这里!我不能在这里做出真正的贡献,但请注意,我尝试在 stats.exchange 和维基百科上转发它。
在我的假数据中,A和B是姐妹团。 ID=4 或 ID=5 的情况都足以证明这一点。每个团体也是他们自己的姐妹团体。创建 parent_group
的目标应该是为 A 或 B 中的所有案例生成一个 parent_group
,并为 C 组生成另一个 parent_group
df <- data.frame(ID = c(9, 5, 2, 4, 3, 7),
group = c("A", "A", "B", "B", "A", "C"),
nominated_ID = c(9, 8, 4, 9, 2, 7) )
df$nominated_group <- with(df, group[match(nominated_ID, ID)])
df
ID group nominated_ID nominated_group
1 9 A 9 A
2 5 A 8 <NA>
3 2 B 4 B
4 4 B 9 A
5 3 A 2 B
6 7 C 7 C
@Aniko points out that one way to view my problem is that I need to find the connected components of a graph, where the vertices are called groups and, variables group
and nominated_group
indicate an edges between those two groups. My goal is to create a variable parent_Group
which indexes the connected components. Or as I put it before:
I have a dataframe with four variables: ID
, group
, and nominated_ID
, and nominated_Group
.
Consider sister-groups: Groups A and B are sister-groups if there is at least one case in the data where group==A and nominated_group==B, or vice versa.
I would like to create a variable parent_group
which takes on a unique value for each set of sister-groups. In other words, no nominations should occur between cases in different parent_group
s. Making the parent_group
sequential numbers seems like a good idea.
Many thanks for the help I already received here! I can't really contribute here but note that I try to pay it forward at stats.exchange and on wikipedia.
In my fake data, A and B are sister-groups. Either case ID=4 or ID=5 are sufficient to make this true. Each group is also their own sister-group. The goal, the creation of parent_group
, should result in one parent_group
for all cases in A or B, and another parent_group
for group C
df <- data.frame(ID = c(9, 5, 2, 4, 3, 7),
group = c("A", "A", "B", "B", "A", "C"),
nominated_ID = c(9, 8, 4, 9, 2, 7) )
df$nominated_group <- with(df, group[match(nominated_ID, ID)])
df
ID group nominated_ID nominated_group
1 9 A 9 A
2 5 A 8 <NA>
3 2 B 4 B
4 4 B 9 A
5 3 A 2 B
6 7 C 7 C
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑一个图,其中组作为其顶点,边指示两个组针对同一 ID 出现。那么我认为您正在寻找该图的连接组件。以下是使用
graph
包快速而肮脏(可能不是最佳)实现此想法的方法:Consider a graph with the groups as its vertices and the edges indicating that the two groups occur for the same ID. Then I think you are looking for connected components of this graph. The following is a quick and dirty (and probably not optimal) implementation of this idea using the
graph
package: