寻找一个函数以有条件地罚款顶部n个值(不是行!)的平均值,并返回数字,而不是dataframe

发布于 2025-01-23 17:55:00 字数 1103 浏览 0 评论 0原文

我有一个大数据框架: 我想在其中计算特定ID的前5个计数的均值

# A tibble: 4,437 x 3
# Groups:   DATETIME [87]
   DATETIME            ID        COUNT
   <dttm>              <chr>     <int>
 1 2020-06-07 00:00:00 Bagheera     NA
 2 2020-06-07 00:00:00 Bagheera2     0
 3 2020-06-07 00:00:00 Baloo img     0
 4 2020-06-07 00:00:00 Banna        NA
 5 2020-06-07 00:00:00 Blair       158
 6 2020-06-07 00:00:00 Carol        NA

,然后在for循环中表示每个计数值作为数量,该数量是该ID计算的平均值ID。 为此,我真的宁愿获得一个平均值,而不是作为所有个人的dataFRME,而是作为所需ID的单个数字,然后将其用作for循环内部的变量。

我实际上是在尝试重建一个适用于每个ID的分离列的相同数据工作的循环,但是在将数据融合到一个ID colum之后,它需要探索:

max_activity <- readline(prompt="enter a number: ")
    for(i in 2:length(percentage_activity)) {
    percentage_activity[[i]] <- 
     as.numeric(percentage_activity[[i]]*100/mean(sort(percentage_activity[[i]] ,T) 
    [1:max_activity]))
}

我也尝试了此方法:我不确定如何从这里进行:

for (i in unique(percentage_activity$ID)){
  individual <- percentage_activity$ID == i
  mean(percentage_activity[individual,"COUNT"], na.rm=TRUE)
}

I have a large data frame:
percentage_activity

# A tibble: 4,437 x 3
# Groups:   DATETIME [87]
   DATETIME            ID        COUNT
   <dttm>              <chr>     <int>
 1 2020-06-07 00:00:00 Bagheera     NA
 2 2020-06-07 00:00:00 Bagheera2     0
 3 2020-06-07 00:00:00 Baloo img     0
 4 2020-06-07 00:00:00 Banna        NA
 5 2020-06-07 00:00:00 Blair       158
 6 2020-06-07 00:00:00 Carol        NA

in which I would like to calculate the mean of the top 5 COUNTs for a specific ID, and then, in a for loop, represent every COUNT value as a quantity with the mean value calculated for this ID as the 100% of this specific ID.
To do that, I would really rather get a mean value not as a datafrme for all individuals but as a single number for the desired ID, and then use it as a variable inside the for loop.

I'm actually trying to reconstruct a loop that workd for the same data orgenized with seperated columns for each ID, but after melting the data to one ID colum It needs adjusments:

max_activity <- readline(prompt="enter a number: ")
    for(i in 2:length(percentage_activity)) {
    percentage_activity[[i]] <- 
     as.numeric(percentage_activity[[i]]*100/mean(sort(percentage_activity[[i]] ,T) 
    [1:max_activity]))
}

I also tried this, but I'm not sure how to proceed from here:

for (i in unique(percentage_activity$ID)){
  individual <- percentage_activity$ID == i
  mean(percentage_activity[individual,"COUNT"], na.rm=TRUE)
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

脸赞 2025-01-30 17:55:00

也许这可能会有所帮助:

library(dplyr)
df <- tibble(
  DATETIME = as.Date(c("2020-06-07",
                       "2020-06-07",
                       "2020-06-07",
                      "2020-06-07",
                       "2020-06-07",
                       "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07")),
  ID = c("Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol", 
         "Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol"),
  COUNT = c(NA, 0,0,NA, 158, NA,10,20,30,40,50, 60)
)

mean_val <- df %>% 
  group_by(ID) %>% 
  arrange(desc(COUNT)) %>% 
  top_n(5) %>% 
  summarise(mean = mean(COUNT, na.rm = T)) 

df %>% 
  left_join(mean_val, by = "ID") %>% 
  mutate(percentage_activity =  COUNT/mean)

Maybe this may help:

library(dplyr)
df <- tibble(
  DATETIME = as.Date(c("2020-06-07",
                       "2020-06-07",
                       "2020-06-07",
                      "2020-06-07",
                       "2020-06-07",
                       "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07",
                      "2020-06-07")),
  ID = c("Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol", 
         "Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol"),
  COUNT = c(NA, 0,0,NA, 158, NA,10,20,30,40,50, 60)
)

mean_val <- df %>% 
  group_by(ID) %>% 
  arrange(desc(COUNT)) %>% 
  top_n(5) %>% 
  summarise(mean = mean(COUNT, na.rm = T)) 

df %>% 
  left_join(mean_val, by = "ID") %>% 
  mutate(percentage_activity =  COUNT/mean)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文