长度格式的组合

发布于 2025-01-27 03:27:09 字数 465 浏览 2 评论 0原文

因此，我有一个父母及其子女的数据集以下形式，

Children_id   Parent_id
10            1
11            1
12            1
13            2
14            2

我想要的是每个孩子的兄弟姐妹长期的数据集，即，

id   sibling_id
10   11
10   12
11   10
11   12
12   10
12   11
13   14
14   13

最好使用DataTable，最好的方法是什么？

示例数据：

df＆lt; - data.frame（“ children_id” = c（10,11,12,13,14），“ parent_id” = c（1,1,1， 2,2））

原文

So I have a dataset of parents and their children of the following form

Children_id   Parent_id
10            1
11            1
12            1
13            2
14            2

What I want is a dataset of each child's siblings in long format, i.e.,

id   sibling_id
10   11
10   12
11   10
11   12
12   10
12   11
13   14
14   13

What's the best way to achieve this, preferably using datatable?

Example data:

df <- data.frame("Children_id" = c(10,11,12,13,14), "Parent_id" = c(1,1,1,2,2))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

没有伤那来痛 2025-02-03 03:27:09

外面的图表专家可能会有更好的解决方案，但这是data.table解决方案：

library(data.table)

setDT(df)[df,on=.(Parent_id), allow.cartesian=T] %>% 
  .[Children_id!=i.Children_id, .(id = i.Children_id, sibling=Children_id)]

输出：输出：

      id sibling
   <num>   <num>
1:    10      11
2:    10      12
3:    11      10
4:    11      12
5:    12      10
6:    12      11
7:    13      14
8:    14      13

The graph experts out there will probably have better solutions, but here is a data.table solution:

library(data.table)

setDT(df)[df,on=.(Parent_id), allow.cartesian=T] %>% 
  .[Children_id!=i.Children_id, .(id = i.Children_id, sibling=Children_id)]

Output:

      id sibling
   <num>   <num>
1:    10      11
2:    10      12
3:    11      10
4:    11      12
5:    12      10
6:    12      11
7:    13      14
8:    14      13

回复收藏 0 原文

自找没趣 2025-02-03 03:27:09

在基础r中，我们可以在explive.grid.grid之后split ting

out <- do.call(rbind, lapply(split(df$Children_id, df$Parent_id), \(x) 
     subset(expand.grid(x, x), Var1 != Var2)[2:1]))
row.names(out) <- NULL
colnames(out) <- c("id", "sibling_id")

-output

> out
  id sibling_id
1 10         11
2 10         12
3 11         10
4 11         12
5 12         10
6 12         11
7 13         14
8 14         13

或使用data.table.table代码> CJ

library(data.table)
setDT(df)[, CJ(id = Children_id, sibling_id = Children_id),
    Parent_id][id != sibling_id, .(id, sibling_id)]
      id sibling_id
   <num>      <num>
1:    10         11
2:    10         12
3:    11         10
4:    11         12
5:    12         10
6:    12         11
7:    13         14
8:    14         13

In base R, we can use expand.grid after splitting

out <- do.call(rbind, lapply(split(df$Children_id, df$Parent_id), \(x) 
     subset(expand.grid(x, x), Var1 != Var2)[2:1]))
row.names(out) <- NULL
colnames(out) <- c("id", "sibling_id")

-output

> out
  id sibling_id
1 10         11
2 10         12
3 11         10
4 11         12
5 12         10
6 12         11
7 13         14
8 14         13

Or using data.table with CJ

library(data.table)
setDT(df)[, CJ(id = Children_id, sibling_id = Children_id),
    Parent_id][id != sibling_id, .(id, sibling_id)]
      id sibling_id
   <num>      <num>
1:    10         11
2:    10         12
3:    11         10
4:    11         12
5:    12         10
6:    12         11
7:    13         14
8:    14         13

回复收藏 0 原文

々眼睛长脚气 2025-02-03 03:27:09

dplyr带有innion_join的解决方案：

library(dplyr)
inner_join(df, df, by = "Parent_id") %>% 
  select(id = Children_id.x, siblings = Children_id.y) %>% 
  filter(id != siblings)

  id siblings
1 10       11
2 10       12
3 11       10
4 11       12
5 12       10
6 12       11
7 13       14
8 14       13

或其他策略：

library(dplyr)
df %>% 
  group_by(Parent_id) %>% 
  mutate(siblings = list(unique(Children_id))) %>% 
  unnest(siblings) %>% 
  filter(Children_id != siblings)

A dplyr solution with inner_join:

library(dplyr)
inner_join(df, df, by = "Parent_id") %>% 
  select(id = Children_id.x, siblings = Children_id.y) %>% 
  filter(id != siblings)

  id siblings
1 10       11
2 10       12
3 11       10
4 11       12
5 12       10
6 12       11
7 13       14
8 14       13

or another strategy:

library(dplyr)
df %>% 
  group_by(Parent_id) %>% 
  mutate(siblings = list(unique(Children_id))) %>% 
  unnest(siblings) %>% 
  filter(Children_id != siblings)

回复收藏 0 原文

~没有更多了~