使用 ids 计算数据帧中的共现次数
我意识到有很多类似的问题,但它们都解决了略有不同的问题,我已经被困了一段时间。
我有一个包含 2 个变量的所有唯一组合的 dataframe
,如下所示:
df = data.frame(id = c('c1','c2','c3','c2','c3','c1','c3'),
groupid = c('g1','g1','g1','g2','g2','g3','g3'))
我需要以下输出:
c1 c2 c3
c1 3 1 2
c2 1 3 2
c3 2 2 3
换句话说,我需要计算每对客户 ID 在同一组中出现的频率。
似乎是一个基本问题,但我无法弄清楚。我尝试:
- 进行交叉连接以查找
(cid1,groupid,cid2)
的所有可能组合, - 循环遍历所有组合,并检索与
cid1
匹配的唯一组以及与匹配cid2
- 获取交集的长度
..但这将永远运行,所以我正在寻找一种高效且最好是干净的解决方案(使用tidyr
/dplyr
)。
I realize there are a lot of similar questions but they all tackle a slightly different problem and I have been stuck for a while.
I have a dataframe
of all unique combinations of 2 variables as follows:
df = data.frame(id = c('c1','c2','c3','c2','c3','c1','c3'),
groupid = c('g1','g1','g1','g2','g2','g3','g3'))
And I need the following output:
c1 c2 c3
c1 3 1 2
c2 1 3 2
c3 2 2 3
In other words I need to count how often each pair of customer ids occur in the same group.
Seems like a basic question, but I can't figure it out. I tried:
- making a cross join to find all possible combinations of
(cid1,groupid,cid2)
- looping through all of them and retrieving unique groups that match
cid1
and unique groups that matchcid2
- taking the length of the intersection
..but this would take forever to run, so I am looking for an efficient and preferably clean solution (using tidyr
/dplyr
).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在通过
table
获取两列的频率计数后,我们可以使用crossprod
We may use
crossprod
after getting the frequency count withtable
on the two columns