根据多列分配 ID 列
下面的问题在这里得到解决: 创建 ID 变量:如果 ≥1 列重复,则标记为重复 这里: 根据重复的整数变量和逻辑变量分配ID
我想根据多个列的多个条件创建一个带有 ID 代码的新列。 这是我的数据示例。
pat N C NC n1 c1
1 1 1 1 1 FALSE FALSE
2 2 1 1 1 FALSE FALSE
3 3 12 31 2 FALSE FALSE
4 4 12 31 2 FALSE FALSE
5 5 3 15 3 FALSE TRUE
6 6 4 15 4 FALSE TRUE
7 7 5 18 5 TRUE FALSE
8 8 5 20 6 TRUE FALSE
9 9 6 21 7 FALSE FALSE
10 10 7 21 8 FALSE FALSE
11 11 8 19 9 FALSE FALSE
12 12 9 11 10 FALSE FALSE
13 13 10 11 11 FALSE FALSE
14 14 11 14 12 FALSE FALSE
sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11),
C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
n1 = c("FALSE", "FALSE","FALSE", "FALSE", "FALSE", "FALSE","TRUE","TRUE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"),
c1 = c("FALSE", "FALSE","FALSE", "FALSE", "TRUE", "TRUE","FALSE","FALSE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"))
编辑: 在一些帮助下,我现在已经成功地为这两个条件创建了新的 ID 列(根据重复的整数变量和逻辑变量分配ID)
- 第N列重复且第n1列为FALSE,或
- 列C是重复的,并且 c1 列为 FALSE。
数据框现在看起来像这样:
pat N C NC n1 c1 new_ID_N new_ID_C
1 1 1 1 1 FALSE FALSE 1 1
2 2 1 1 1 FALSE FALSE 1 1
3 3 12 31 2 FALSE FALSE 2 2
4 4 12 31 2 FALSE FALSE 2 2
5 5 3 15 3 FALSE TRUE 3 3
6 6 4 15 4 FALSE TRUE 4 4
7 7 5 18 5 TRUE FALSE 5 5
8 8 5 20 6 TRUE FALSE 6 6
9 9 6 21 7 FALSE FALSE 7 7
10 10 7 21 8 FALSE FALSE 8 7
11 11 8 19 9 FALSE FALSE 9 8
12 12 9 11 10 FALSE FALSE 10 9
13 13 10 11 11 FALSE FALSE 11 9
14 14 11 14 12 FALSE FALSE 12 10
最后,我想用数字创建最后一个 new_ID 列,但如果满足以下条件,则标记为重复数字:
- NC 列重复,或
- new_ID_N 重复,或
- new_ID_C 重复。
我已经尝试过答案中建议的脚本
sample <- data.table::as.data.table(sample)[
j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
drop=TRUE))
]
但这显示错误消息'(无法分配大小的向量...,另外警告消息:In ans lenght(l):由整数溢出产生的NA)。
预先非常感谢
The question below is solved here: Create ID variable: if ≥1 column duplicate then mark as duplicate
and here:
assign ID based on duplicate integer variable and logical variable
I would like to create a new column with an ID code based on multiple conditions of several columns.
This is a sample of my data.
pat N C NC n1 c1
1 1 1 1 1 FALSE FALSE
2 2 1 1 1 FALSE FALSE
3 3 12 31 2 FALSE FALSE
4 4 12 31 2 FALSE FALSE
5 5 3 15 3 FALSE TRUE
6 6 4 15 4 FALSE TRUE
7 7 5 18 5 TRUE FALSE
8 8 5 20 6 TRUE FALSE
9 9 6 21 7 FALSE FALSE
10 10 7 21 8 FALSE FALSE
11 11 8 19 9 FALSE FALSE
12 12 9 11 10 FALSE FALSE
13 13 10 11 11 FALSE FALSE
14 14 11 14 12 FALSE FALSE
sample <- data.frame(pat = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11),
C = c(1,1,31,31,15,15,18,20,21,21,19,11,11,14),
NC = c(1,1,2,2,3,4,5,6,7,8,9,10,11,12),
n1 = c("FALSE", "FALSE","FALSE", "FALSE", "FALSE", "FALSE","TRUE","TRUE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"),
c1 = c("FALSE", "FALSE","FALSE", "FALSE", "TRUE", "TRUE","FALSE","FALSE","FALSE","FALSE", "FALSE","FALSE", "FALSE", "FALSE"))
EDIT:
With some help i've now managed to create new ID columns for these 2 conditions (assign ID based on duplicate integer variable and logical variable)
- column N is duplicate and column n1 is FALSE, or
- column C is duplicate and column c1 is FALSE.
The dataframe now looks like this:
pat N C NC n1 c1 new_ID_N new_ID_C
1 1 1 1 1 FALSE FALSE 1 1
2 2 1 1 1 FALSE FALSE 1 1
3 3 12 31 2 FALSE FALSE 2 2
4 4 12 31 2 FALSE FALSE 2 2
5 5 3 15 3 FALSE TRUE 3 3
6 6 4 15 4 FALSE TRUE 4 4
7 7 5 18 5 TRUE FALSE 5 5
8 8 5 20 6 TRUE FALSE 6 6
9 9 6 21 7 FALSE FALSE 7 7
10 10 7 21 8 FALSE FALSE 8 7
11 11 8 19 9 FALSE FALSE 9 8
12 12 9 11 10 FALSE FALSE 10 9
13 13 10 11 11 FALSE FALSE 11 9
14 14 11 14 12 FALSE FALSE 12 10
Finally i would like to create the last new_ID column with numbers, but marked as a duplicate number if:
- column NC is duplicate, OR
- new_ID_N is duplicate, OR
- new_ID_C is duplicate.
I've tried the script suggested in the answers
sample <- data.table::as.data.table(sample)[
j = new_ID := base::as.numeric(base::interaction(var1, var..., varn,
drop=TRUE))
]
But this shows an error message '(cannot allocate vector of size ..., in addtion warning message: In ans lenght(l): NAs produced by integer overflow).
Many thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此代码可以根据多个变量创建唯一 ID。
由于你的解释对我来说不太清楚,我让你自己尝试一下。我相信您必须为每个条件创建一个新的变量/列,然后将这些变量放入代码中。
This code makes it possible to create an unique ID based on multiple variables.
Since your explanation is not really clear to me, I let you try this by yourself. I believe you have to create a new variable/column for each condition, then put those variables into the code.
这是一个选项:
如果不是
n1
/c1
约束,您应该能够使用as.integer(factor(...)
其中...
= 使用所需变量的paste
或interaction
调用,但使用
n1
。和c1
,我能想到的只是一个循环。首先排序!请注意,您在TRUE
和FALSE
周围有引号,我已将其删除。Here is one option:
If it weren't for the
n1
/c1
constraints, you should be able to useas.integer(factor(...)
where...
= apaste
orinteraction
call with the variables needed.But with
n1
andc1
, all I could think of was a loop. This requires sorting first! And note that you had quotes aroundTRUE
andFALSE
, which I removed.