如何在向前斜杠之后提取最后一段文字

发布于 2025-02-10 04:53:55 字数 631 浏览 3 评论 0原文

我有一个看起来像这样的df:

afgtsample_name
0.0011/1path/to/sample/name/id0001.vcf.gz
0.0050/1path/to/sample/name/name/id0002.vcf.gz

我想要的是要仅在sample_name列中保留ID名称:

AFGTSample_name
0.0011/1ID0001
0.0050/1ID0002

我非常感谢您为实现这一目标的任何帮助。

I have a df that looks like this:

AFGTSample_name
0.0011/1path/to/sample/name/ID0001.vcf.gz
0.0050/1path/to/sample/name/ID0002.vcf.gz

What I want is to only keep the ID name in the Sample_name column:

AFGTSample_name
0.0011/1ID0001
0.0050/1ID0002

I would very much appreciate any help in achieving this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夜血缘 2025-02-17 04:53:55

您可以在此处使用一些内置的文件名助手。

  • basename()
  • 工具:: file_path_sans_ext()

因此,在此示例中只需:

library(tools)

df$Sample_name <- file_path_sans_ext(basename(df$Sample_name), compression = TRUE)

There are some built in file name helpers that you can use here.

  • basename()
  • tools::file_path_sans_ext()

So in this example simply do:

library(tools)

df$Sample_name <- file_path_sans_ext(basename(df$Sample_name), compression = TRUE)
-小熊_ 2025-02-17 04:53:55

您可以使用gsub():跨您的数据框架使用REGEX模式

gsub(".*(ID\\d*).*", replacement = "\\1", x = "path/to/sample/name/ID0001.vcf.gz")
#> "ID0001"

df$sample_name2 <- gsub(".*(ID\\d*).*", replacement = "\\1", x = df$sample_name)

You can use a regex pattern with gsub():

gsub(".*(ID\\d*).*", replacement = "\\1", x = "path/to/sample/name/ID0001.vcf.gz")
#> "ID0001"

Across your dataframe:

df$sample_name2 <- gsub(".*(ID\\d*).*", replacement = "\\1", x = df$sample_name)
亚希 2025-02-17 04:53:55

这是整理解决方案。请注意,仅当您ID字符串始终具有:ID之后的4个数字时才有效:

library(dplyr)
library(stringr)

df %>% 
  mutate(Sample_name=str_extract(Sample_name, 'ID\\d{4}'))
    AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002

Here is tidyverse solution. Note this only works if you ID string has always: ID followed by 4 numbers:

library(dplyr)
library(stringr)

df %>% 
  mutate(Sample_name=str_extract(Sample_name, 'ID\\d{4}'))
    AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002
躲猫猫 2025-02-17 04:53:55

使用subbasename获取示例名称:

df$Sample_name <- sub('\\..*

输出:

     AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002

数据

df <- data.frame(AF = c(0.001, 0.005),
                 GT = c("1/1", "0/1"),
                 Sample_name = c("path/to/sample/name/ID0001.vcf.gz", "path/to/sample/name/ID0002.vcf.gz"))
, '', basename(df$Sample_name)) df

输出:

数据

Using sub with basename to take the sample name:

df$Sample_name <- sub('\\..*

Output:

     AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002

Data

df <- data.frame(AF = c(0.001, 0.005),
                 GT = c("1/1", "0/1"),
                 Sample_name = c("path/to/sample/name/ID0001.vcf.gz", "path/to/sample/name/ID0002.vcf.gz"))
, '', basename(df$Sample_name)) df

Output:

Data

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文