我们可以通过分组列构建R函数/逻辑以获取记录ID

发布于 2025-02-13 05:47:27 字数 3370 浏览 1 评论 0原文

我有以下数据集，其中我想使用“ Current_Record_ID”和“ Store”来创建“ new_record_id”列。

对于每个重复的Current_Record_ID，都应只有2家商店。如果商店超过2个，则记录ID应更改并增加+1为上一个记录ID（预期结果）。

示例数据框：

df <- data.frame(Stores=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16,17,18,19,20),
                 Current_Record_ID=c(1, 1, 2, 3, 3, 3, 4, 4, 4, 4,4,4,4,5,5,6,7,7,7,8))

预期结果：

存储	Current_Record_Id	New_record_id
1	1	1
1	1 2	1
1	3	3
2	2	4
3 3	3 3 3 3 3	3
3	4 7	4
5	8	4 5
9	4	5 9 4
6 10	4 6	11
4 6 10	4 7	7
12	4	7
12	4 7 13 4	7
13	4 8	14
5 14	5	9
15	5	9
16	6	10
17	7	11
18	7	11
19	7	12
20	8	13

*如果我们有较大的n个商店数据集和courtect_record_id，我们想修复商店/不超过100计数。我们如何创建new_record_id。

原文

I have below dataset where I want to create a "New_Record_ID" column using the "Current_Record_ID" and "Stores".

For every repeating Current_Record_ID there should only be 2 stores. If Stores exceeds by 2 the record ID should change and increase by +1 to the previous record ID( Expected result).

Sample dataframe:

df <- data.frame(Stores=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16,17,18,19,20),
                 Current_Record_ID=c(1, 1, 2, 3, 3, 3, 4, 4, 4, 4,4,4,4,5,5,6,7,7,7,8))

Expected Result:

Stores	Current_Record_ID	New_Record_ID
1	1	1
2	1	1
3	2	2
4	3	3
5	3	3
6	3	4
7	4	5
8	4	5
9	4	6
10	4	6
11	4	7
12	4	7
13	4	8
14	5	9
15	5	9
16	6	10
17	7	11
18	7	11
19	7	12
20	8	13

*Also if we have a larger data set of n stores and Currect_Record_ID and we want to fix stores/not to exceed 100 counts. How can we will create the New_record_ID .?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十六岁半 2025-02-20 05:47:27

可能不需要循环，但是使用dplyr：它似乎可以很好地工作。

library(dplyr)

counter <- function(id, row) {
  count_up <- case_when(
    id != lag(id) ~ TRUE,
    id == lag(id, n = 2L) & row %% 2 == 0 ~ FALSE,
    id == lag(id, n = 2L) ~ TRUE,
    TRUE ~ FALSE
  )
  
  n <- 1
  
  output <- vector(mode = "integer", length = length(id))
  
  for (i in seq_along(count_up)) {
    if (count_up[i]) {
      n <- n + 1
      } 
    output[i] <- n
  }
  output
}

df |> 
  group_by(Current_Record_ID) |> 
  mutate(row = row_number()) |> 
  ungroup() |> 
  mutate(New_Record_ID = counter(Current_Record_ID, row)) |> 
  select(-row)

# A tibble: 20 × 3
   Stores Current_Record_ID New_Record_ID
    <dbl>             <dbl>         <dbl>
 1      1                 1             1
 2      2                 1             1
 3      3                 2             2
 4      4                 3             3
 5      5                 3             3
 6      6                 3             4
 7      7                 4             5
 8      8                 4             5
 9      9                 4             6
10     10                 4             6
11     11                 4             7
12     12                 4             7
13     13                 4             8
14     14                 5             9
15     15                 5             9
16     16                 6            10
17     17                 7            11
18     18                 7            11
19     19                 7            12
20     20                 8            13

There may be no need for a for loop, but it seems to work well enough, using dplyr:

library(dplyr)

counter <- function(id, row) {
  count_up <- case_when(
    id != lag(id) ~ TRUE,
    id == lag(id, n = 2L) & row %% 2 == 0 ~ FALSE,
    id == lag(id, n = 2L) ~ TRUE,
    TRUE ~ FALSE
  )
  
  n <- 1
  
  output <- vector(mode = "integer", length = length(id))
  
  for (i in seq_along(count_up)) {
    if (count_up[i]) {
      n <- n + 1
      } 
    output[i] <- n
  }
  output
}

df |> 
  group_by(Current_Record_ID) |> 
  mutate(row = row_number()) |> 
  ungroup() |> 
  mutate(New_Record_ID = counter(Current_Record_ID, row)) |> 
  select(-row)

# A tibble: 20 × 3
   Stores Current_Record_ID New_Record_ID
    <dbl>             <dbl>         <dbl>
 1      1                 1             1
 2      2                 1             1
 3      3                 2             2
 4      4                 3             3
 5      5                 3             3
 6      6                 3             4
 7      7                 4             5
 8      8                 4             5
 9      9                 4             6
10     10                 4             6
11     11                 4             7
12     12                 4             7
13     13                 4             8
14     14                 5             9
15     15                 5             9
16     16                 6            10
17     17                 7            11
18     18                 7            11
19     19                 7            12
20     20                 8            13

回复收藏 0 原文

~没有更多了~