如何在 R 中过滤行相互比较的条件
我有一个数据框:
UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")
sales <- data.frame(UserId, SellerId, Product, SalesDate)
我想找到这样的销售:
- 同一用户在同一天从同一卖家那里购买了两次相同的产品,但当然我需要更大规模地进行。
我已经思考了很长一段时间如何使用这些标准之一,但什么也没想到。在这种情况下,我应该留下的表是:
UserId | SellerId | Product | SalesDate |
---|---|---|---|
A | X | ball | 2022-01-01 |
A | X | ball | 2022-01-01 |
UserId 相同,卖家相同,产品相同且 salesdate 是相同。问题是我不寻找特定的用户或特定的产品。
我想找到所有两次购买相同产品的用户(无论产品是什么 - 列表很长),与购买日期相同(日期并不重要,对于同一用户来说必须相同) 。
您对如何执行部分代码有什么想法吗?
I have a dataframe:
UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")
sales <- data.frame(UserId, SellerId, Product, SalesDate)
And I want to find sales for which:
- the same user bought the same product twice from the same seller on the same day, but of course I need to do it on a larger scale.
I've been thinking for a long time how to even use one of these criteria and nothing comes to mind. The table I should be left with in this case is:
UserId | SellerId | Product | SalesDate |
---|---|---|---|
A | X | ball | 2022-01-01 |
A | X | ball | 2022-01-01 |
UserId is the same, seller is the same, the product is the same and salesdate is the same. The problem is that I don't look for specific users or specific products.
I would like to find all users who bought the same product twice (no matter what the product is - the list is long), the same with purchasedate (the date doesn't matter, it needs to be the same for the same user).
Do you have any ideas how to do even a part of the code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
发布评论
评论(3)
使用add_count()
将为您提供每次出现的次数。
sales %>%
add_count(UserId, SellerId, Product, SalesDate)
UserId SellerId Product SalesDate n
1 A X ball 2022-01-01 2
2 A X ball 2022-01-01 2
3 A Y ball 2022-01-02 1
4 B Y ball 2022-01-04 1
5 B Z doll 2022-01-06 1
6 B Z doll 2022-01-07 1
从那里您可以过滤 n == 2
或 n > 1
取决于您的问题。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
Using
dplyr
, you cangroup_by_all
variables, andfilter
out anything that do not have more than 1 records.Using
dplyr
, you cangroup_by_all
variables, andfilter
out anything that do not have more than 1 records.