r计算重叠类别的唯一计数

发布于 2025-02-11 06:58:23 字数 784 浏览 1 评论 0原文

我非常沮丧地计算了在3个单个类别(应用程序,桌面,Web)方案和其余重叠类别(app& web,app& desktop,web&桌面,App& Web& amp;这是我正在研究的示例数据集。

我可以在R中找到具有汇总和group_by函数的单个类别计数,但是,我真的无法弄清楚如何在重叠类别上工作。

真的非常感谢,如果有人可以帮助我!!!谢谢!!!

df <- data.frame(list(ClientID = c("1", "1", "1", "2", "2", "3", "3" , "3" , "3" , "4" ),
                     device = c("App", "Web", "App", "Web", "Web", "App", "Desktop", "App", "App", "Web"),
                     conversion = c("0", "0", "0", "0", "1", "1", "0", "1", "0", "1")) ) 

以下是预期的结果:

Scenario                With Conversion      Without Conversion 
App                          
Web                          
Desktop
App & Web
App & Desktop
Web & Desktop
App & Desktop & Web

I have been so frustrated to count the number of clients who made conversion or not in the 3 single categories (app, desktop, web) scenarios and the rest of the overlapping categories (app & Web, app & desktop, web & desktop, app & web & desktop) scenarios. Here is the sample dataset I am working on.

I could figure out the single category count with the aggregate and group_by function in r, however, I can't really figure out how to work on the overlap categories.

Really really thanks so much if someone could help me on this!!! Thanks!!!

df <- data.frame(list(ClientID = c("1", "1", "1", "2", "2", "3", "3" , "3" , "3" , "4" ),
                     device = c("App", "Web", "App", "Web", "Web", "App", "Desktop", "App", "App", "Web"),
                     conversion = c("0", "0", "0", "0", "1", "1", "0", "1", "0", "1")) ) 

Below is the desired outcome:

Scenario                With Conversion      Without Conversion 
App                          
Web                          
Desktop
App & Web
App & Desktop
Web & Desktop
App & Desktop & Web

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甲如呢乙后呢 2025-02-18 06:58:23

您可以准备设备组合列表(即所有7个可能性),然后在辅助功能中使用setequal() and unique> unique(),例如

  1. 设备组合的此列表,使用combn()
device_combinations = unlist(
  sapply(1:3, \(i) combn(c("App", "Web" ,"Desktop"), i, simplify = F)),
  recursive = F
)
  1. 助手功能,f(获取设备列表,并指示唯一的设备列表是否符合每个设备组合一个仅通过使用unique()setequal()
f <- function(d) sapply(device_combinations, \(ds) setequal(ds, unique(d)))
  1. 现在通过转换client> clientId,UNNEST应用功能,通过转换计数方案,并透射到您所需的广泛格式
df %>% 
  group_by(conversion,ClientID) %>%
  summarize(v = f(device), .groups="drop") %>%
  unnest(v) %>% 
  mutate(
    Scenario=rep(sapply(device_combinations,paste,collapse=","),length.out=n()),
    conversion =if_else(conversion=="0", "Without Conversion", "With Conversion")
  ) %>% 
  group_by(Scenario, conversion) %>% 
  summarize(ct = sum(v), .groups="drop") %>% 
  pivot_wider(id_cols = Scenario, names_from = conversion,values_from = ct)

输出:

  Scenario        `With Conversion` `Without Conversion`
  <chr>                       <int>                <int>
1 App                             1                    0
2 App,Desktop                     0                    1
3 App,Web                         0                    1
4 App,Web,Desktop                 0                    0
5 Desktop                         0                    0
6 Web                             2                    1
7 Web,Desktop                     0                    0

You can prepare a list of device combinations (i.e. all 7 possibilities), and then use setequal() and unique() in a helper function, like this

  1. list of device combinations, using combn()
device_combinations = unlist(
  sapply(1:3, \(i) combn(c("App", "Web" ,"Desktop"), i, simplify = F)),
  recursive = F
)
  1. Helper function, f (takes a list of devices, and indicates whether the unique list of devices meets each of the device combinations (it can meet at max one only, by using unique() and setequal()
f <- function(d) sapply(device_combinations, \(ds) setequal(ds, unique(d)))
  1. Now apply the function by conversion and ClientID, unnest, count by conversion and Scenario and pivot to your desired wide format
df %>% 
  group_by(conversion,ClientID) %>%
  summarize(v = f(device), .groups="drop") %>%
  unnest(v) %>% 
  mutate(
    Scenario=rep(sapply(device_combinations,paste,collapse=","),length.out=n()),
    conversion =if_else(conversion=="0", "Without Conversion", "With Conversion")
  ) %>% 
  group_by(Scenario, conversion) %>% 
  summarize(ct = sum(v), .groups="drop") %>% 
  pivot_wider(id_cols = Scenario, names_from = conversion,values_from = ct)

Output:

  Scenario        `With Conversion` `Without Conversion`
  <chr>                       <int>                <int>
1 App                             1                    0
2 App,Desktop                     0                    1
3 App,Web                         0                    1
4 App,Web,Desktop                 0                    0
5 Desktop                         0                    0
6 Web                             2                    1
7 Web,Desktop                     0                    0
不甘平庸 2025-02-18 06:58:23

我想除了已经在这里的答案外,我还会添加答案。

根据您的评论进行更新,

这看起来并不是所有的整洁,但确实可以做您期望的。尽管另一个答案看起来好多了。

我首先通过设备转换来收集独特客户的计数。

library(tidyverse)

dfn <- map(c("0", "1"),
           function(k) {
             with(df[df$conversion == k,], table(device, ClientID)) %>% 
               as.data.frame() %>% filter(Freq > 0) %>% select(-Freq) %>% 
               distinct() %>% group_by(device) %>% summarise(cnt = n())
             }
           )

dfa <- data.frame(device = unique(df$device)) %>% 
  left_join(., dfn[[1]]) %>% setNames(., c("device", "0")) %>% 
  left_join(., dfn[[2]])
names(dfa)[3] <- "1"
dfa[is.na(dfa)] <- 0
dfa
#    device 0 1
# 1     App 2 1
# 2     Web 2 2
# 3 Desktop 1 0 

然后我想要组合。这里只有三个,所以我可以比编码更快地编写组合。但是,我提供了一种更具动态的方法。

dbls = RcppAlgos::comboGeneral(dfa$device, 2)
#      [,1]    [,2]   
# [1,] App     Desktop
# [2,] App     Web    
# [3,] Desktop Web    
# Levels: App Desktop Web 

现在,我将使用数据框架df来计算dbls中确定的组合的计数。

dfb <- map2_dfr(rep(1:nrow(dbls), 2), rep(c("0", "1"), 3), 
    function(x, y){
      gimme = with(df[df$conversion == y, ], table(device, ClientID)) %>% 
        as.data.frame() %>% 
        filter(Freq > 0, device %in% c(dbls[x, 1], dbls[x, 2])) %>% 
        select(-Freq) %>% distinct() %>% group_by(ClientID) %>% 
        mutate(cnt = n()) %>% filter(cnt == 2)
      if(nrow(gimme) == 0){
        c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
          wh = y, cnt = 0)
      } else {
        c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
          wh = y, cnt = length(unique(gimme$ClientID)))
      }
    }) %>% pivot_wider(names_from = "wh", values_from = "cnt")
# # A tibble: 3 × 3
#   device        `0`   `1`  
#   <chr>         <chr> <chr>
# 1 App & Desktop 1     0    
# 2 App & Web     1     0    
# 3 Desktop & Web 0     0  

最后但并非最不重要的一点是,我结合了两个帧。

我不知道哪个转换(0或1)与或没有,所以我只是留下标记。

rbind(dfa, dfb)
#          device 0 1
# 1           App 2 1
# 2           Web 2 2
# 3       Desktop 1 0
# 4 App & Desktop 1 0
# 5     App & Web 1 0
# 6 Desktop & Web 0 0 

I thought I would add my answer in addition to the one that's already here.

Updated based on your comment

This doesn't look all that neat, but it does do what you're expecting. Although the other answer looks far better.

I started by collecting the count of unique customers by conversion by device.

library(tidyverse)

dfn <- map(c("0", "1"),
           function(k) {
             with(df[df$conversion == k,], table(device, ClientID)) %>% 
               as.data.frame() %>% filter(Freq > 0) %>% select(-Freq) %>% 
               distinct() %>% group_by(device) %>% summarise(cnt = n())
             }
           )

dfa <- data.frame(device = unique(df$device)) %>% 
  left_join(., dfn[[1]]) %>% setNames(., c("device", "0")) %>% 
  left_join(., dfn[[2]])
names(dfa)[3] <- "1"
dfa[is.na(dfa)] <- 0
dfa
#    device 0 1
# 1     App 2 1
# 2     Web 2 2
# 3 Desktop 1 0 

Then I wanted the combinations. There are only three here, so I could probably write the combinations faster than coding them. However, I've provided a more dynamic approach.

dbls = RcppAlgos::comboGeneral(dfa$device, 2)
#      [,1]    [,2]   
# [1,] App     Desktop
# [2,] App     Web    
# [3,] Desktop Web    
# Levels: App Desktop Web 

Now I'll use data frame df to calculate the counts for the combinations determined in dbls.

dfb <- map2_dfr(rep(1:nrow(dbls), 2), rep(c("0", "1"), 3), 
    function(x, y){
      gimme = with(df[df$conversion == y, ], table(device, ClientID)) %>% 
        as.data.frame() %>% 
        filter(Freq > 0, device %in% c(dbls[x, 1], dbls[x, 2])) %>% 
        select(-Freq) %>% distinct() %>% group_by(ClientID) %>% 
        mutate(cnt = n()) %>% filter(cnt == 2)
      if(nrow(gimme) == 0){
        c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
          wh = y, cnt = 0)
      } else {
        c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
          wh = y, cnt = length(unique(gimme$ClientID)))
      }
    }) %>% pivot_wider(names_from = "wh", values_from = "cnt")
# # A tibble: 3 × 3
#   device        `0`   `1`  
#   <chr>         <chr> <chr>
# 1 App & Desktop 1     0    
# 2 App & Web     1     0    
# 3 Desktop & Web 0     0  

Last, but not least, I combined the two frames.

I don't know which conversion (0 or 1) is with or without, so I just left the markers.

rbind(dfa, dfb)
#          device 0 1
# 1           App 2 1
# 2           Web 2 2
# 3       Desktop 1 0
# 4 App & Desktop 1 0
# 5     App & Web 1 0
# 6 Desktop & Web 0 0 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文