R sqldf 计数、分组依据、具有 > 1
我有一个有3列的表,有几千条记录,示例如下:
df1 <- data.frame(
ID = c('V1', 'V1', 'V1', 'V3', 'V3', 'V3', 'V4', 'V5','V5','V5'),
Category = c('a', 'a', 'a', 'a', 'b', 'b', 'a', 'b', 'c', 'c'),
Amount = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
需要使用sqldf查询按ID和类别对数据进行分组,然后对金额进行求和,并且仅当ID有多个类别时才返回行。结果如下所示:
ID Category Amount_Sum
V3 a 1
V3 b 2
V5 b 1
V5 c 2
我尝试了以下代码,但它实际上不起作用,还有分区上的另一个 row_number ,它可以工作,但比需要许多额外查询所需的时间要长得多。
df2 <- sqldf::sqldf("
SELECT Count(*) [CNT]
[ID],
[Category],
SUM([Amount]) [amount]
FROM df1
GROUP BY [ID],
[Category]
Having Count(*) > 1")
在 R 中使用 sqldf 获取输出的最佳方法是什么?谢谢!
I have a table with 3 columns, and a few thousand records, sample is below:
df1 <- data.frame(
ID = c('V1', 'V1', 'V1', 'V3', 'V3', 'V3', 'V4', 'V5','V5','V5'),
Category = c('a', 'a', 'a', 'a', 'b', 'b', 'a', 'b', 'c', 'c'),
Amount = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
Need to query with sqldf to group the data by ID and Category, then sum the amount, and only return rows when the ID has more than one category. The result would look like below:
ID Category Amount_Sum
V3 a 1
V3 b 2
V5 b 1
V5 c 2
I tried the following code which doesnt really work, and also another row_number over partition which sort of works but is way longer than it needs to be needing many additional queries.
df2 <- sqldf::sqldf("
SELECT Count(*) [CNT]
[ID],
[Category],
SUM([Amount]) [amount]
FROM df1
GROUP BY [ID],
[Category]
Having Count(*) > 1")
What is the best way to get that output with sqldf in R? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以添加一个联接,该联接仅限于具有多个类别的 ID:
You could add a join which restricts to only IDs having more than one category: