R 中的加入/分组
我有两个这样的数据集: 水果
ID | 苹果 | 橙子 | 梨 |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 0 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 1 | 0 | 0 |
该数据集表示具有该 ID 的人是否拥有该水果(1)( 0)。这里的ID是主键。
另一个数据集是Juice。该表表示该 ID 在给定日期制作的果汁。该数据集中没有重复项。
ID | 日期 |
---|---|
1 | 8/12/2021 |
1 | 6/9/2020 |
2 | 7/14/2020 |
2 | 3/6/2021 |
2 | 5/2/2020 |
3 | 8/31/2021 |
5 | 9/21/2020 |
我想要的输出 就是知道哪种水果被使用了多少次。如果一个 ID 有超过 1 个水果,请考虑他使用这两种水果来制作果汁。
让我们按列跟踪 - 苹果 - ID 2、ID 3 和 ID 5 有苹果。 ID 2 榨汁 3 次,ID 3 榨汁 1 次,ID 3 榨汁 1 次,所以苹果用了 5 次(3+1+1)。同样,ID 1 和 ID 3 有橙色。 ID 1 榨汁 2 次,ID 3 榨汁 1 次,所以橙子用了 3 次(2+1)。 ID 1 榨汁 2 次,ID 4 榨汁 0 次,所以梨用了 2 次。
水果 | 计数 |
---|---|
苹果 | 5 |
橙子 | 3 |
梨 | 2 |
我希望用 R、Python 或 SQL 实现此目的,但我认为 R 具有解决此问题的最佳函数。我不太确定如何解决这个问题,因为涉及两个表。任何帮助将不胜感激。
I have 2 datasets like this:
Fruits
ID | Apples | Oranges | Pears |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 0 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 1 | 0 | 0 |
This dataset represents if a person with that ID has that fruit(1) or not(0). Here ID is the primary key.
Another dataset is Juice. This table represents juice made by that ID on the given date. There are no duplicates in this dataset.
ID | Dates |
---|---|
1 | 8/12/2021 |
1 | 6/9/2020 |
2 | 7/14/2020 |
2 | 3/6/2021 |
2 | 5/2/2020 |
3 | 8/31/2021 |
5 | 9/21/2020 |
My desired output would be to know which fruit was used how many times. If an Id has more than 1 fruit, consider he used both the fruits to make the juice.
Let's follow column-wise- Apples- ID 2, ID 3 and ID 5 has apples. ID 2 made juice 3 times, ID 3 made juice 1 time and ID 3 made juice 1 time, so apple was used 5 times(3+1+1). Similarly, ID 1 and ID 3 has oranges. ID 1 made juice 2 times and ID 3 made juice 1 time, so orange was used 3 times(2+1). ID 1 made juice 2 times, and ID 4 made juice 0 times, so pear was used 2 times.
Fruit | Count |
---|---|
Apples | 5 |
Oranges | 3 |
Pears | 2 |
I want this in R, Python or SQL, though I think R has the best functions to approach this problem. I am not really sure how to approach this as there are two tables involved. Any help would be really appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
R:base
R:dplyr
R:data.table
替代方案,与上面的 dplyr 解决方案更加一致:
SQL(通过 R 的
sqldf
)此实例使用 SQLite 引擎,该引擎不支持
PIVOT
。还有其他 sqldf 引擎可能支持它,并且对其他 DBMS 执行“原始 SQL”应该允许人们在其方言中更自然地进行转换。R数据
R: base
R: dplyr
R: data.table
Alternative, more aligned with the dplyr solution above:
SQL (via R's
sqldf
)This instance is using the SQLite engine, which does not support
PIVOT
. There are othersqldf
engines that may support it, and doing "raw SQL" to other DBMSes should allow one to pivot more naturally within its dialect.R data