如何在 R 中过滤行相互比较的条件

发布于 01-21 01:09 字数 950 浏览 2 评论 0原文

我有一个数据框:

UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")

sales <- data.frame(UserId, SellerId, Product, SalesDate)

我想找到这样的销售:

  • 同一用户在同一天从同一卖家那里购买了两次相同的产品,但当然我需要更大规模地进行。

我已经思考了很长一段时间如何使用这些标准之一,但什么也没想到。在这种情况下,我应该留下的表是:

UserIdSellerIdProductSalesDate
AXball2022-01-01
AXball2022-01-01

UserId 相同,卖家相同,产品相同且 salesdate 是相同。问题是我不寻找特定的用户或特定的产品。

我想找到所有两次购买相同产品的用户(无论产品是什么 - 列表很长),与购买日期相同(日期并不重要,对于同一用户来说必须相同) 。

您对如何执行部分代码有什么想法吗?

I have a dataframe:

UserId <- c("A", "A", "A", "B", "B", "B")
SellerId <- c("X", "X", "Y", "Y", "Z", "Z")
Product <- c("ball", "ball", "ball", "ball", "doll", "doll")
SalesDate <- c("2022-01-01", "2022-01-01", "2022-01-02", "2022-01-04", "2022-01-06", "2022-01-07")

sales <- data.frame(UserId, SellerId, Product, SalesDate)

And I want to find sales for which:

  • the same user bought the same product twice from the same seller on the same day, but of course I need to do it on a larger scale.

I've been thinking for a long time how to even use one of these criteria and nothing comes to mind. The table I should be left with in this case is:

UserIdSellerIdProductSalesDate
AXball2022-01-01
AXball2022-01-01

UserId is the same, seller is the same, the product is the same and salesdate is the same. The problem is that I don't look for specific users or specific products.

I would like to find all users who bought the same product twice (no matter what the product is - the list is long), the same with purchasedate (the date doesn't matter, it needs to be the same for the same user).

Do you have any ideas how to do even a part of the code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

谜兔2025-01-28 01:09:28

Using dplyr, you can group_by_all variables, and filter out anything that do not have more than 1 records.

library(dplyr)

sales %>% group_by_all() %>% filter(n() > 1)

# A tibble: 2 × 4
# Groups:   UserId, SellerId, Product, SalesDate [1]
  UserId SellerId Product SalesDate 
  <chr>  <chr>    <chr>   <chr>     
1 A      X        ball    2022-01-01
2 A      X        ball    2022-01-01

Using dplyr, you can group_by_all variables, and filter out anything that do not have more than 1 records.

library(dplyr)

sales %>% group_by_all() %>% filter(n() > 1)

# A tibble: 2 × 4
# Groups:   UserId, SellerId, Product, SalesDate [1]
  UserId SellerId Product SalesDate 
  <chr>  <chr>    <chr>   <chr>     
1 A      X        ball    2022-01-01
2 A      X        ball    2022-01-01
陪我终i2025-01-28 01:09:28

按全部分组并使用过滤器。与 @benson23 +1 的区别在于使用 across:

library(dplyr)
sales %>% 
  group_by(across(everything())) %>%
  filter( n() > 1 )

或什至使用 everything() 为默认值:

sales %>% 
  group_by(across()) %>%
  filter( n() > 1 )

Group by all and use filter. The difference to @benson23 +1 is to use across:

library(dplyr)
sales %>% 
  group_by(across(everything())) %>%
  filter( n() > 1 )

or even as everything() is default:

sales %>% 
  group_by(across()) %>%
  filter( n() > 1 )
有深☉意2025-01-28 01:09:28

使用add_count()将为您提供每次出现的次数。

sales %>%
  add_count(UserId, SellerId, Product,  SalesDate)

  UserId SellerId Product  SalesDate n
1      A        X    ball 2022-01-01 2
2      A        X    ball 2022-01-01 2
3      A        Y    ball 2022-01-02 1
4      B        Y    ball 2022-01-04 1
5      B        Z    doll 2022-01-06 1
6      B        Z    doll 2022-01-07 1

从那里您可以过滤 n == 2n > 1取决于您的问题。

Using add_count() will give you the number of each occurence.

sales %>%
  add_count(UserId, SellerId, Product,  SalesDate)

  UserId SellerId Product  SalesDate n
1      A        X    ball 2022-01-01 2
2      A        X    ball 2022-01-01 2
3      A        Y    ball 2022-01-02 1
4      B        Y    ball 2022-01-04 1
5      B        Z    doll 2022-01-06 1
6      B        Z    doll 2022-01-07 1

from there on you can filter for n == 2 or n > 1 depending on your question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文