R 中的简单数据操作

发布于 2024-11-05 22:46:44 字数 1417 浏览 9 评论 0原文

@Aniko 指出,查看我的问题的一种方法是,我需要找到图的连接组件,其中顶点称为组,变量 groupnomminated_group指示这两个组之间的边缘。我的目标是创建一个变量 parent_Group 来索引连接的组件。或者正如我之前所说:

我有一个包含四个变量的数据框:IDgroupnomminated_IDnomminated_Group代码>.

考虑姐妹组:如果数据中至少存在一种情况 group==A 且nominal_group==B,则组 A 和 B 是姐妹组,反之亦然。

我想创建一个变量 parent_group ,它为每组姐妹组采用唯一的值。换句话说,不同 parent_group 中的案例之间不应出现提名。使 parent_group 顺序编号似乎是个好主意。

非常感谢我已经收到的帮助 这里!我不能在这里做出真正的贡献,但请注意,我尝试在 stats.exchange 和维基百科上转发它。

在我的假数据中,A和B是姐妹团。 ID=4 或 ID=5 的情况都足以证明这一点。每个团体也是他们自己的姐妹团体。创建 parent_group 的目标应该是为 A 或 B 中的所有案例生成一个 parent_group,并为 C 组生成另一个 parent_group

df <- data.frame(ID = c(9, 5, 2, 4, 3, 7), 
  group = c("A", "A", "B", "B", "A", "C"),
  nominated_ID = c(9, 8, 4, 9, 2, 7)     )

df$nominated_group <- with(df, group[match(nominated_ID, ID)])

df

  ID group nominated_ID nominated_group
1  9     A            9               A
2  5     A            8            <NA>
3  2     B            4               B
4  4     B            9               A
5  3     A            2               B
6  7     C            7               C

@Aniko points out that one way to view my problem is that I need to find the connected components of a graph, where the vertices are called groups and, variables group and nominated_group indicate an edges between those two groups. My goal is to create a variable parent_Group which indexes the connected components. Or as I put it before:

I have a dataframe with four variables: ID, group, and nominated_ID, and nominated_Group.

Consider sister-groups: Groups A and B are sister-groups if there is at least one case in the data where group==A and nominated_group==B, or vice versa.

I would like to create a variable parent_group which takes on a unique value for each set of sister-groups. In other words, no nominations should occur between cases in different parent_groups. Making the parent_group sequential numbers seems like a good idea.

Many thanks for the help I already received here! I can't really contribute here but note that I try to pay it forward at stats.exchange and on wikipedia.

In my fake data, A and B are sister-groups. Either case ID=4 or ID=5 are sufficient to make this true. Each group is also their own sister-group. The goal, the creation of parent_group, should result in one parent_group for all cases in A or B, and another parent_group for group C

df <- data.frame(ID = c(9, 5, 2, 4, 3, 7), 
  group = c("A", "A", "B", "B", "A", "C"),
  nominated_ID = c(9, 8, 4, 9, 2, 7)     )

df$nominated_group <- with(df, group[match(nominated_ID, ID)])

df

  ID group nominated_ID nominated_group
1  9     A            9               A
2  5     A            8            <NA>
3  2     B            4               B
4  4     B            9               A
5  3     A            2               B
6  7     C            7               C

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情话已封尘 2024-11-12 22:46:44

考虑一个图,其中组作为其顶点,边指示两个组针对同一 ID 出现。那么我认为您正在寻找该图的连接组件。以下是使用 graph 包快速而肮脏(可能不是最佳)实现此想法的方法:

library(graph)
#make some fake data
nom <- data.frame(group = c("A","A","A","B","B","C","C"),
                  group2 = c("A","A","B","B","A","C","C"),
            stringsAsFactors=FALSE)
#remove duplicated pairs
#it will keep A-B distinct from B-A, could probably be fixed
nom1 <- nom[!duplicated(nom),]

#define empty graph
grps <- union(unique(nom$group), unique(nom$group2))
gg <- new("graphNEL", nodes=grps, edgeL=list())
#add an edge for every pair
for (i in 1:nrow(nom1)) gg <- addEdge(nom1$group[i], nom1$group2[i], gg, 1)

#find connected components
cc <- connComp(gg)

#assing parent by matching within cc
nom$parent <- apply(nom, 1, 
    function(x) which(sapply(cc, function(y) x["group"] %in% y)))
nom

  group group2 parent
1     A      A      1
2     A      A      1
3     A      B      1
4     B      B      1
5     B      A      1
6     C      C      2
7     C      C      2

Consider a graph with the groups as its vertices and the edges indicating that the two groups occur for the same ID. Then I think you are looking for connected components of this graph. The following is a quick and dirty (and probably not optimal) implementation of this idea using the graph package:

library(graph)
#make some fake data
nom <- data.frame(group = c("A","A","A","B","B","C","C"),
                  group2 = c("A","A","B","B","A","C","C"),
            stringsAsFactors=FALSE)
#remove duplicated pairs
#it will keep A-B distinct from B-A, could probably be fixed
nom1 <- nom[!duplicated(nom),]

#define empty graph
grps <- union(unique(nom$group), unique(nom$group2))
gg <- new("graphNEL", nodes=grps, edgeL=list())
#add an edge for every pair
for (i in 1:nrow(nom1)) gg <- addEdge(nom1$group[i], nom1$group2[i], gg, 1)

#find connected components
cc <- connComp(gg)

#assing parent by matching within cc
nom$parent <- apply(nom, 1, 
    function(x) which(sapply(cc, function(y) x["group"] %in% y)))
nom

  group group2 parent
1     A      A      1
2     A      A      1
3     A      B      1
4     B      B      1
5     B      A      1
6     C      C      2
7     C      C      2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文