根据多列分配 ID 列

发布于 2025-01-13 20:08:07 字数 3203 浏览 2 评论 0原文

下面的问题在这里得到解决: 创建 ID 变量:如果 ≥1 列重复,则标记为重复 这里: 根据重复的整数变量和逻辑变量分配ID

我想根据多个列的多个条件创建一个带有 ID 代码的新列。 这是我的数据示例。

     pat     N     C    NC n1    c1   
 1     1     1     1     1 FALSE FALSE
 2     2     1     1     1 FALSE FALSE
 3     3    12    31     2 FALSE FALSE
 4     4    12    31     2 FALSE FALSE
 5     5     3    15     3 FALSE TRUE 
 6     6     4    15     4 FALSE TRUE 
 7     7     5    18     5 TRUE  FALSE
 8     8     5    20     6 TRUE  FALSE
 9     9     6    21     7 FALSE FALSE
10    10     7    21     8 FALSE FALSE
11    11     8    19     9 FALSE FALSE
12    12     9    11    10 FALSE FALSE
13    13    10    11    11 FALSE FALSE
14    14    11    14    12 FALSE FALSE

sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                      N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11), 
                     C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
                     NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
                     n1 = c("FALSE", "FALSE","FALSE", "FALSE", "FALSE", "FALSE","TRUE","TRUE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"),
                     c1 = c("FALSE", "FALSE","FALSE", "FALSE", "TRUE", "TRUE","FALSE","FALSE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"))

编辑: 在一些帮助下,我现在已经成功地为这两个条件创建了新的 ID 列(根据重复的整数变量和逻辑变量分配ID

  1. 第N列重复且第n1列为FALSE,或
  2. 列C是重复的,并且 c1 列为 FALSE。

数据框现在看起来像这样:

     pat     N     C    NC n1    c1        new_ID_N   new_ID_C
 1     1     1     1     1 FALSE FALSE     1           1
 2     2     1     1     1 FALSE FALSE     1           1
 3     3    12    31     2 FALSE FALSE     2           2
 4     4    12    31     2 FALSE FALSE     2           2
 5     5     3    15     3 FALSE TRUE      3           3
 6     6     4    15     4 FALSE TRUE      4           4
 7     7     5    18     5 TRUE  FALSE     5           5
 8     8     5    20     6 TRUE  FALSE     6           6
 9     9     6    21     7 FALSE FALSE     7           7
10    10     7    21     8 FALSE FALSE     8           7
11    11     8    19     9 FALSE FALSE     9           8
12    12     9    11    10 FALSE FALSE     10          9
13    13    10    11    11 FALSE FALSE     11          9
14    14    11    14    12 FALSE FALSE     12          10

最后,我想用数字创建最后一个 new_ID 列,但如果满足以下条件,则标记为重复数字:

  1. NC 列重复,或
  2. new_ID_N 重复,或
  3. new_ID_C 重复。

我已经尝试过答案中建议的脚本

sample <- data.table::as.data.table(sample)[
  j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
                                                   drop=TRUE))
]

但这显示错误消息'(无法分配大小的向量...,另外警告消息:In ans lenght(l):由整数溢出产生的NA)。

预先非常感谢

The question below is solved here: Create ID variable: if ≥1 column duplicate then mark as duplicate
and here:
assign ID based on duplicate integer variable and logical variable

I would like to create a new column with an ID code based on multiple conditions of several columns.
This is a sample of my data.

     pat     N     C    NC n1    c1   
 1     1     1     1     1 FALSE FALSE
 2     2     1     1     1 FALSE FALSE
 3     3    12    31     2 FALSE FALSE
 4     4    12    31     2 FALSE FALSE
 5     5     3    15     3 FALSE TRUE 
 6     6     4    15     4 FALSE TRUE 
 7     7     5    18     5 TRUE  FALSE
 8     8     5    20     6 TRUE  FALSE
 9     9     6    21     7 FALSE FALSE
10    10     7    21     8 FALSE FALSE
11    11     8    19     9 FALSE FALSE
12    12     9    11    10 FALSE FALSE
13    13    10    11    11 FALSE FALSE
14    14    11    14    12 FALSE FALSE

sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                      N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11), 
                     C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
                     NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
                     n1 = c("FALSE", "FALSE","FALSE", "FALSE", "FALSE", "FALSE","TRUE","TRUE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"),
                     c1 = c("FALSE", "FALSE","FALSE", "FALSE", "TRUE", "TRUE","FALSE","FALSE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"))

EDIT:
With some help i've now managed to create new ID columns for these 2 conditions (assign ID based on duplicate integer variable and logical variable)

  1. column N is duplicate and column n1 is FALSE, or
  2. column C is duplicate and column c1 is FALSE.

The dataframe now looks like this:

     pat     N     C    NC n1    c1        new_ID_N   new_ID_C
 1     1     1     1     1 FALSE FALSE     1           1
 2     2     1     1     1 FALSE FALSE     1           1
 3     3    12    31     2 FALSE FALSE     2           2
 4     4    12    31     2 FALSE FALSE     2           2
 5     5     3    15     3 FALSE TRUE      3           3
 6     6     4    15     4 FALSE TRUE      4           4
 7     7     5    18     5 TRUE  FALSE     5           5
 8     8     5    20     6 TRUE  FALSE     6           6
 9     9     6    21     7 FALSE FALSE     7           7
10    10     7    21     8 FALSE FALSE     8           7
11    11     8    19     9 FALSE FALSE     9           8
12    12     9    11    10 FALSE FALSE     10          9
13    13    10    11    11 FALSE FALSE     11          9
14    14    11    14    12 FALSE FALSE     12          10

Finally i would like to create the last new_ID column with numbers, but marked as a duplicate number if:

  1. column NC is duplicate, OR
  2. new_ID_N is duplicate, OR
  3. new_ID_C is duplicate.

I've tried the script suggested in the answers

sample <- data.table::as.data.table(sample)[
  j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
                                                   drop=TRUE))
]

But this shows an error message '(cannot allocate vector of size ..., in addtion warning message: In ans lenght(l): NAs produced by integer overflow).

Many thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

短叹 2025-01-20 20:08:07

此代码可以根据多个变量创建唯一 ID。

sample <- data.table::as.data.table(sample)[
  j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
                                                   drop=TRUE))
]

由于你的解释对我来说不太清楚,我让你自己尝试一下。我相信您必须为每个条件创建一个新的变量/列,然后将这些变量放入代码中。

This code makes it possible to create an unique ID based on multiple variables.

sample <- data.table::as.data.table(sample)[
  j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
                                                   drop=TRUE))
]

Since your explanation is not really clear to me, I let you try this by yourself. I believe you have to create a new variable/column for each condition, then put those variables into the code.

凡尘雨 2025-01-20 20:08:07

这是一个选项:

sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                      N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11), 
                     C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
                     NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
                     n1 = c(FALSE, FALSE,FALSE, FALSE, FALSE, FALSE,TRUE,TRUE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE),
                     c1 = c(FALSE, FALSE,FALSE, FALSE, TRUE, TRUE,FALSE,FALSE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE))


sample <- sample[order(sample$NC, sample$N, sample$C), ]

id <- 1
sample[1, 'new_ID'] <- id

for (i in 2:nrow(sample)) {

  if (((sample[i, 'NC'] != sample[i - 1, 'NC']) | sample[i - 1, 'n1']) &
      ((sample[i, 'C'] != sample[i - 1, 'C']) | sample[i - 1, 'c1'])) {
    id <- id + 1
  }

  sample[i, 'new_ID'] <- id

}
> sample
   pat  N  C NC    n1    c1 new_ID
1    1  1  1  1 FALSE FALSE      1
2    2  1  1  1 FALSE FALSE      1
3    3 12 31  2 FALSE FALSE      2
4    4 12 31  2 FALSE FALSE      2
5    5  3 15  3 FALSE  TRUE      3
6    6  4 15  4 FALSE  TRUE      4
7    7  5 18  5  TRUE FALSE      5
8    8  5 20  6  TRUE FALSE      6
9    9  6 21  7 FALSE FALSE      7
10  10  7 21  8 FALSE FALSE      7
11  11  8 19  9 FALSE FALSE      8
12  12  9 11 10 FALSE FALSE      9
13  13 10 11 11 FALSE FALSE      9
14  14 11 14 12 FALSE FALSE     10

如果不是 n1/c1 约束,您应该能够使用 as.integer(factor(...) 其中 ... = 使用所需变量的 pasteinteraction 调用,

但使用 n1 。和c1,我能想到的只是一个循环。首先排序!请注意,您在 TRUEFALSE 周围有引号,我已将其删除。

Here is one option:

sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                      N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11), 
                     C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
                     NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
                     n1 = c(FALSE, FALSE,FALSE, FALSE, FALSE, FALSE,TRUE,TRUE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE),
                     c1 = c(FALSE, FALSE,FALSE, FALSE, TRUE, TRUE,FALSE,FALSE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE))


sample <- sample[order(sample$NC, sample$N, sample$C), ]

id <- 1
sample[1, 'new_ID'] <- id

for (i in 2:nrow(sample)) {

  if (((sample[i, 'NC'] != sample[i - 1, 'NC']) | sample[i - 1, 'n1']) &
      ((sample[i, 'C'] != sample[i - 1, 'C']) | sample[i - 1, 'c1'])) {
    id <- id + 1
  }

  sample[i, 'new_ID'] <- id

}
> sample
   pat  N  C NC    n1    c1 new_ID
1    1  1  1  1 FALSE FALSE      1
2    2  1  1  1 FALSE FALSE      1
3    3 12 31  2 FALSE FALSE      2
4    4 12 31  2 FALSE FALSE      2
5    5  3 15  3 FALSE  TRUE      3
6    6  4 15  4 FALSE  TRUE      4
7    7  5 18  5  TRUE FALSE      5
8    8  5 20  6  TRUE FALSE      6
9    9  6 21  7 FALSE FALSE      7
10  10  7 21  8 FALSE FALSE      7
11  11  8 19  9 FALSE FALSE      8
12  12  9 11 10 FALSE FALSE      9
13  13 10 11 11 FALSE FALSE      9
14  14 11 14 12 FALSE FALSE     10

If it weren't for the n1/c1 constraints, you should be able to use as.integer(factor(...) where ... = a paste or interaction call with the variables needed.

But with n1 and c1, all I could think of was a loop. This requires sorting first! And note that you had quotes around TRUE and FALSE, which I removed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文