在snakemake中，使用一个未知变量，使用Expand（）的最佳方法是什么？

发布于 2025-02-11 02:29:02 字数 836 浏览 1 评论 0原文

我目前正在使用Snakemake进行生物信息学项目。给定人参考基因组（HG19）和一个BAM文件，我希望能够指定具有相同名称但不同扩展名的多个输出文件。这是我的代码

rule gridss_preprocess:
        input:
                ref=config['ref'],
                bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
                bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
        output:
                expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample = "{sample}")

当前config ['workreq']是以“”开头的扩展名列表。

例如，我希望能够使用Expand来指示以下文件，

S1.dedup.downsampled.bam.cigar_metrics
S1.dedup.downsampled.bam.computesamtags.changes.tsv
S1.dedup.downsampled.bam.coverage.blacklist.bed
S1.dedup.downsampled.bam.idsv_metrics

我希望能够为多个示例文件进行此操作。目前，当我尝试进行干式运行时，我没有遇到错误。但是，我不确定这是否会正确运行。

我在做这件事吗？

原文

I am currently using Snakemake for a bioinformatics project. Given a human reference genome (hg19) and a bam file, I want to be able to specify that there will be multiple output files with the same name but different extensions. Here is my code

rule gridss_preprocess:
        input:
                ref=config['ref'],
                bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
                bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
        output:
                expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample = "{sample}")

Currently config['workreq'] is a list of extensions that start with "."

For example, I want to be able to use expand to indicate the following files

S1.dedup.downsampled.bam.cigar_metrics
S1.dedup.downsampled.bam.computesamtags.changes.tsv
S1.dedup.downsampled.bam.coverage.blacklist.bed
S1.dedup.downsampled.bam.idsv_metrics

I want to be able to do this for multiple sample files, S_. Currently I am not getting an error when I try to do a dry run. However, I am not sure if this will run properly.

Am I doing this right?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

动次打次papapa 2025-02-18 02:29:02

展开（）定义了文件列表。如果您使用两个参数，将使用笛卡尔产品。因此，您的规则将定义为所有样本的扩展名列表输出所有文件。由于您在输入中定义了一个通配符，因此我认为您想要的是所有示例的扩展名的文件。此规则将与样本数量一样多次执行。

您正在混合 wildcards >展开（）函数。您可以通过加倍括号来定义Expand（）内的通配符：

rule all:
    input:  expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample=SAMPLELIST)

rule gridss_preprocess:
    input:
            ref=config['ref'],
            bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
            bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
    output:
            expand(config['bamdir'] + "{{sample}}.dedup.downsampled.bam{ext}", ext = config['workreq'])

此扩展功能将在list

{sample} .dedup.downsampled.bam.cigar_metrics
{sample> {sample} .dedep.dedup.dedep.dedysmpled 中扩展。。
代码>

，因此定义通配符示例以匹配输入中的文件。

expand() defines a list of files. If you're using two parameters, the cartesian product will be used. Thus, your rule will define as output ALL files with your extension list for ALL samples. Since you define a wildcard in your input, I think that what you want is all files with your extension for ONE sample. And this rule will be executed as many times as the number of samples.

You're mixing up wildcards and placeholders for the expand() function. You can define a wildcard inside an expand() by doubling the brackets:

rule all:
    input:  expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample=SAMPLELIST)

rule gridss_preprocess:
    input:
            ref=config['ref'],
            bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
            bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
    output:
            expand(config['bamdir'] + "{{sample}}.dedup.downsampled.bam{ext}", ext = config['workreq'])

This expand function will expand in list