生成多个带有通配符的文件,然后合并为一个

发布于 2025-01-11 23:01:05 字数 1544 浏览 1 评论 0原文

我的 Snakefile 有两条规则:一条使用通配符生成多组文件,另一条将所有内容合并到一个文件中。我是这样写的:

chr = range(1,23)

rule generate:
    input:
        og_files = config["tmp"] + '/chr{chr}.bgen',
    output:
        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --bgen {input.og_files} \
        --make-bed \
        --oxford-single-chr \
        --out {config[tmp]}/plink/chr{chr}
        """
rule merge:
    input:
        plink_chr = expand(config["tmp"] + '/plink/chr{chr}.{ext}',
                           chr = chr,
                           ext = ['bed', 'bim', 'fam'])
    output:
        out = multiext(config["tmp"] + '/all',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --pmerge-list-dir {config[tmp]}/plink \
        --make-bed \
        --out {config[tmp]}/all
        """

不幸的是,这不允许我跟踪从第一条规则到第二条规则的文件:

$ snakemake -s myfile.smk -c1 -np                                                                           
Building DAG of jobs...                                                                                                                                       
MissingInputException in line 17 of myfile.smk:                            
Missing input files for rule merge: 
[list of all the files made by expand()]   

我可以使用什么来生成带有通配符 chr< 的 22 组文件/code> 在 generate 中,但能够在 merge 的输入中跟踪它们?预先感谢您的帮助

I have two rules on my Snakefile: one generates several sets of files using wildcards, the other one merges everything into a single file. This is how I wrote it:

chr = range(1,23)

rule generate:
    input:
        og_files = config["tmp"] + '/chr{chr}.bgen',
    output:
        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --bgen {input.og_files} \
        --make-bed \
        --oxford-single-chr \
        --out {config[tmp]}/plink/chr{chr}
        """
rule merge:
    input:
        plink_chr = expand(config["tmp"] + '/plink/chr{chr}.{ext}',
                           chr = chr,
                           ext = ['bed', 'bim', 'fam'])
    output:
        out = multiext(config["tmp"] + '/all',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --pmerge-list-dir {config[tmp]}/plink \
        --make-bed \
        --out {config[tmp]}/all
        """

Unfortunately, this does not allow me to track the file coming from the first rule to the 2nd rule:

$ snakemake -s myfile.smk -c1 -np                                                                           
Building DAG of jobs...                                                                                                                                       
MissingInputException in line 17 of myfile.smk:                            
Missing input files for rule merge: 
[list of all the files made by expand()]   

What can I use to be able to generate the 22 sets of files with the wildcard chr in generate, but be able to track them in the input of merge? Thank you in advance for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

给妤﹃绝世温柔 2025-01-18 23:01:05

在规则 generate 中,我认为您不想转义 {chr} 通配符,否则它不会被替换。即:

        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')

应该是:

        out = multiext(config["tmp"] + '/plink/chr{chr}',
                       '.bed', '.bim', '.fam')

In rule generate I think you don't want to escape the {chr} wildcard, otherwise it doesn't get replaced. I.e.:

        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')

should be:

        out = multiext(config["tmp"] + '/plink/chr{chr}',
                       '.bed', '.bim', '.fam')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文