重新编码 R 中的现有列

发布于 2025-01-13 21:54:46 字数 705 浏览 5 评论 0原文

我有包含以下两列的数据框,

      Tumor_Barcode    SEX
     MEL-JWCI-WGS-1   Male
     MEL-JWCI-WGS-11   Male
     MEL-JWCI-WGS-12 Female
     MEL-JWCI-WGS-13   Male
    

我想将 Tumor_Barcode 列重新编码为第三列 Sample_ID ,输出应如下所示。

     Tumor_Barcode   Sex   Sample_ID
     MEL-JWCI-WGS-1   Male  ME001
     MEL-JWCI-WGS-11   Male ME011
     MEL-JWCI-WGS-12 Female ME012
     MEL-JWCI-WGS-13   Male ME013

无论如何我可以在 R 中做到这一点吗?

数据:

Tumor_Barcode<-c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex<-c("Male", "Male", "Female", "Male")
DF1<-data.frame(Tumor_Barcode,Sex)

I have dataframe containing following two columns

      Tumor_Barcode    SEX
     MEL-JWCI-WGS-1   Male
     MEL-JWCI-WGS-11   Male
     MEL-JWCI-WGS-12 Female
     MEL-JWCI-WGS-13   Male
    

I want to recode column Tumor_Barcode into third column Sample_ID and output should be as following.

     Tumor_Barcode   Sex   Sample_ID
     MEL-JWCI-WGS-1   Male  ME001
     MEL-JWCI-WGS-11   Male ME011
     MEL-JWCI-WGS-12 Female ME012
     MEL-JWCI-WGS-13   Male ME013

Is there anyway i can do it in R?

Data:

Tumor_Barcode<-c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex<-c("Male", "Male", "Female", "Male")
DF1<-data.frame(Tumor_Barcode,Sex)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

海之角 2025-01-20 21:54:46

这是一个基本的 R 方式。

Tumor_Barcode <- c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex <- c("Male", "Male", "Female", "Male")
DF1 <- data.frame(Tumor_Barcode,Sex)

num <- as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode))
DF1$Sample_ID <- sprintf("ME%03d", num)
rm(num)    # tidy up
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

reprex 包 (v2.0.1)于 2022 年 3 月 11 日

创建创建新列的两行代码可以成为一行:

DF1$Sample_ID <- sprintf("ME%03d", as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode)))
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

创建于 2022 年 3 月 11 日,由 reprex 包 (v2.0.1)

Here is a base R way.

Tumor_Barcode <- c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex <- c("Male", "Male", "Female", "Male")
DF1 <- data.frame(Tumor_Barcode,Sex)

num <- as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode))
DF1$Sample_ID <- sprintf("ME%03d", num)
rm(num)    # tidy up
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

Created on 2022-03-11 by the reprex package (v2.0.1)

The two code lines that create the new column can become a one-liner:

DF1$Sample_ID <- sprintf("ME%03d", as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode)))
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

Created on 2022-03-11 by the reprex package (v2.0.1)

初心未许 2025-01-20 21:54:46

一个可能的解决方案:

library(tidyverse)

DF1 %>% 
  mutate(Sample_ID = str_c("ME", str_extract(Tumor_Barcode, "\\d+$") %>% 
         str_pad(3, pad = "0")))

#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

A possible solution:

library(tidyverse)

DF1 %>% 
  mutate(Sample_ID = str_c("ME", str_extract(Tumor_Barcode, "\\d+
quot;) %>% 
         str_pad(3, pad = "0")))

#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013
浪推晚风 2025-01-20 21:54:46

我们可以使用base R

DF1$Sample_ID <- with(DF1, sprintf('%s%03d', 
   substr(trimws(Tumor_Barcode), 1, 2), 
      as.integer(trimws(Tumor_Barcode, whitespace = "\\D+"))))

-output

> DF1
    Tumor_Barcode    Sex Sample_ID
1  MEL-JWCI-WGS-1   Male     ME001
2 MEL-JWCI-WGS-11   Male     ME011
3 MEL-JWCI-WGS-12 Female     ME012
4 MEL-JWCI-WGS-13   Male     ME013

We may use base R

DF1$Sample_ID <- with(DF1, sprintf('%s%03d', 
   substr(trimws(Tumor_Barcode), 1, 2), 
      as.integer(trimws(Tumor_Barcode, whitespace = "\\D+"))))

-output

> DF1
    Tumor_Barcode    Sex Sample_ID
1  MEL-JWCI-WGS-1   Male     ME001
2 MEL-JWCI-WGS-11   Male     ME011
3 MEL-JWCI-WGS-12 Female     ME012
4 MEL-JWCI-WGS-13   Male     ME013
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文