新的ID列,具体取决于R中的另一列

发布于 2025-02-09 21:34:42 字数 695 浏览 4 评论 0原文

我想根据另一列在我的DF中生成新的ID列 我的DF看起来像这样的东西,

> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
    cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2      CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF

我想添加一个新的列DF $ ID,该DF $ ID看一下DF $ CDR3并为每个值分配一个新字符,如果重复该值,则使用与之前使用过的值相同的值 所以它变成了这样的事情,

>df 
    cdr3                           ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1 
2      CGSRLTF;CASSQEGTGVYEQYF     X2
3 CAAETSGSRLTF;CASSQEGT            X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1

非常感谢

I want to generate a new ID column in my df based on another column
my df looks something like this

> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
    cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2      CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF

I want to add a new column df$ID that looks into df$cdr3 and assigns a new character for each value, and if the value is repeated it uses the same value that was used before
So it becomes something like this

>df 
    cdr3                           ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1 
2      CGSRLTF;CASSQEGTGVYEQYF     X2
3 CAAETSGSRLTF;CASSQEGT            X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1

Thanks a lot guys

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深者入戏 2025-02-16 21:34:42

我们可以在中使用Match基础r在'cdr3'中匹配unique值,获取索引和paste 使用X

df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))

-Output

> df
                          cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2      CGSRLTF;CASSQEGTGVYEQYF X2
3        CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1

We can use match in base R to match the unique values in 'cdr3', get the index and paste with X

df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))

-output

> df
                          cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2      CGSRLTF;CASSQEGTGVYEQYF X2
3        CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
何处潇湘 2025-02-16 21:34:42

这是tidyverse使用fct_inorder from forcats软件包的解决方案。使用fct_inorder我们可以在row_number()中保留其他订单!

library(tidyverse)

tibble(cdr3) %>% 
  mutate(cdr3 = fct_inorder(cdr3, row_number())) %>% 
  mutate(ID = paste0("X", as.numeric(factor(cdr3))))
  cdr3                         ID   
  <ord>                        <chr>
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
2 CGSRLTF;CASSQEGTGVYEQYF      X2   
3 CAAETSGSRLTF;CASSQEGT        X3   
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
Warning messages:
1: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 
2: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 

Here is tidyverse solution with using fct_inorder from forcats package. With fct_inorder we could keep ther order in row_number()!

library(tidyverse)

tibble(cdr3) %>% 
  mutate(cdr3 = fct_inorder(cdr3, row_number())) %>% 
  mutate(ID = paste0("X", as.numeric(factor(cdr3))))
  cdr3                         ID   
  <ord>                        <chr>
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
2 CGSRLTF;CASSQEGTGVYEQYF      X2   
3 CAAETSGSRLTF;CASSQEGT        X3   
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
Warning messages:
1: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 
2: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文