仅当数量不再增加时才按组汇总减少量

发布于 2025-01-19 22:21:32 字数 1562 浏览 6 评论 0原文

我正在尝试计算给定年份<的市场上水果摊（地块）损失的苹果和梨数量/代码>。在这里，损失被定义为当年该地块中苹果或梨的数量减少但保持在该数量并且不再增加。换句话说，一个苹果或梨可能会从地块中丢失，但如果添加另一个（例如重新库存），那么这并不构成“损失”。我希望按年份、年份和地块总结丢失的苹果数量和丢失的梨数量。日期顺序在这里很重要（即从未来的日期到过去的日期不会发生丢失），但我已经按年份对数据集进行了排序，因此这不应该是问题。

下面是一个数据示例：

table <- "date year plot apples pears
1  2021-05-26 2020   a    1      1
2  2021-05-27 2020   a    1      1
3  2021-05-28 2020   a    0      1
4  2021-05-29 2020   a    1      1
5  2021-05-30 2020   a    1      1
6  2021-05-27 2021   b    2      1
7  2021-05-28 2021   b    2      1
8  2021-05-29 2021   b    1      0
9  2021-05-30 2021   b    1      0
10 2021-05-31 2021   b    1      0
11 2021-05-27 2021   c    1      0
12 2021-05-28 2021   c    1      1
13 2021-05-29 2021   c    0      1
14 2021-05-30 2021   c    0      1
15 2021-05-31 2021   c    0      1"

根据此示例，您可以预期：

2020 年，没有苹果丢失，也没有梨丢失（数字没有丢失）不减少并保持在减少的数量）。
2021 年，丢失了两个苹果（图 b 中的一个，图 c 中的一个）和 1 个梨丢失（图 b 中），

其输出看起来与按年份总结的类似：

table <- "date year apples.lost pears.lost
1  2020   0      0
2  2021   2      1"

或者如果也按图分组：

table <- "date year plot apples.lost pears.lost
1  2020   a    0      0
2  2021   b    1      1
3  2021   c    1      0"

I我花了几个小时试图弄清楚如何做到这一点，但我无法想出可行的代码。我可以根据 this< 等资源计算数据集中的增加/减少/a>，但我似乎找不到一种方法来仅计算该特定图中今年剩余时间内保持该数字的减少量。

原文

I am trying to calculate the number of apples and pears lost from fruit stands (plots) at a market in a given year. Here, a loss would be defined as when the number of apples or pears decreases but remains at that number and does not increase again in that plot for that year. In other words, an apple or pear can be lost from the plot, but if another is added (e.g. re-stock), then this does not constitute a "loss". I am looking to summarize the number of apples lost and the number of pears lost by year, and also by year and plot. The date order is important here (i.e. a loss cannot happen from a date in the future to a date in the past), but I have already sorted my dataset by year so this should not be an issue.

Here is an example of the data:

table <- "date year plot apples pears
1  2021-05-26 2020   a    1      1
2  2021-05-27 2020   a    1      1
3  2021-05-28 2020   a    0      1
4  2021-05-29 2020   a    1      1
5  2021-05-30 2020   a    1      1
6  2021-05-27 2021   b    2      1
7  2021-05-28 2021   b    2      1
8  2021-05-29 2021   b    1      0
9  2021-05-30 2021   b    1      0
10 2021-05-31 2021   b    1      0
11 2021-05-27 2021   c    1      0
12 2021-05-28 2021   c    1      1
13 2021-05-29 2021   c    0      1
14 2021-05-30 2021   c    0      1
15 2021-05-31 2021   c    0      1"

Based on this example, you would expect:

In 2020, there were no apples lost and no pears lost (the number did
not decrease and remain at that decreased number).
In 2021, there were two apples lost (one in plot b and one in plot c) and 1 pear lost (in plot b)

Which as an output would look similar to this summarized by year:

table <- "date year apples.lost pears.lost
1  2020   0      0
2  2021   2      1"

Or this if also grouped by plot:

table <- "date year plot apples.lost pears.lost
1  2020   a    0      0
2  2021   b    1      1
3  2021   c    1      0"

I have spent hours trying to figure out how to do this and I cannot come up with viable code. I can calculate increases/decreases in datasets, based on resources such as this, but I cannot seem to find a way to work in counting only decreases that remain at that number for the remainder of the year in that specific plot.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

染年凉城似染瑾 2025-01-26 22:21:32

使用dplyr软件包：

library(dplyr)

df %>%
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  ungroup()

#> # A tibble: 3 x 4
#>    year plot  apples.lost pears.lost
#>   <dbl> <chr>       <dbl>      <dbl>
#> 1  2020 a               1          0
#> 2  2021 b               1          1
#> 3  2021 c               1          0

要获取每年的总和，您将再次汇总：

df %>% 
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  group_by(year) %>%
  summarise(apples.lost = sum(apples.lost),
            pears.lost = sum(pears.lost))

#> # A tibble: 2 x 3
#>    year apples.lost pears.lost
#>   <dbl>       <dbl>      <dbl>
#> 1  2020           1          0
#> 2  2021           2          1

Using the dplyr package:

library(dplyr)

df %>%
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  ungroup()

#> # A tibble: 3 x 4
#>    year plot  apples.lost pears.lost
#>   <dbl> <chr>       <dbl>      <dbl>
#> 1  2020 a               1          0
#> 2  2021 b               1          1
#> 3  2021 c               1          0

To get the total sum per year, you'd summarise it again:

df %>% 
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  group_by(year) %>%
  summarise(apples.lost = sum(apples.lost),
            pears.lost = sum(pears.lost))

#> # A tibble: 2 x 3
#>    year apples.lost pears.lost
#>   <dbl>       <dbl>      <dbl>
#> 1  2020           1          0
#> 2  2021           2          1

回复收藏 0 原文

最单纯的乌龟 2025-01-26 22:21:32

data.table解决方案：

  library(data.table)
  
  dt <- fread(text = "obs date year plot apples pears
1  2021-05-26 2020   a    1      1
2  2021-05-27 2020   a    1      1
3  2021-05-28 2020   a    0      1
4  2021-05-29 2020   a    1      1
5  2021-05-30 2020   a    1      1
6  2021-05-27 2021   b    2      1
7  2021-05-28 2021   b    2      1
8  2021-05-29 2021   b    1      0
9  2021-05-30 2021   b    1      0
10 2021-05-31 2021   b    1      0
11 2021-05-27 2021   c    1      0
12 2021-05-28 2021   c    1      1
13 2021-05-29 2021   c    0      1
14 2021-05-30 2021   c    0      1
15 2021-05-31 2021   c    0      1")
  
  dt[, .(apples.lost = max(0L, first(apples) - last(apples)), pears.lost = max(0L, first(pears) - last(pears))), by = year:plot]
#>    year plot apples.lost pears.lost
#> 1: 2020    a           0          0
#> 2: 2021    b           1          1
#> 3: 2021    c           1          0

A data.table solution:

  library(data.table)
  
  dt <- fread(text = "obs date year plot apples pears
1  2021-05-26 2020   a    1      1
2  2021-05-27 2020   a    1      1
3  2021-05-28 2020   a    0      1
4  2021-05-29 2020   a    1      1
5  2021-05-30 2020   a    1      1
6  2021-05-27 2021   b    2      1
7  2021-05-28 2021   b    2      1
8  2021-05-29 2021   b    1      0
9  2021-05-30 2021   b    1      0
10 2021-05-31 2021   b    1      0
11 2021-05-27 2021   c    1      0
12 2021-05-28 2021   c    1      1
13 2021-05-29 2021   c    0      1
14 2021-05-30 2021   c    0      1
15 2021-05-31 2021   c    0      1")
  
  dt[, .(apples.lost = max(0L, first(apples) - last(apples)), pears.lost = max(0L, first(pears) - last(pears))), by = year:plot]
#>    year plot apples.lost pears.lost
#> 1: 2020    a           0          0
#> 2: 2021    b           1          1
#> 3: 2021    c           1          0

回复收藏 0 原文

~没有更多了~