Snakemake重复规则的输出文件已经存在

发布于 2025-01-31 18:20:24 字数 3321 浏览 3 评论 0 原文

我正在运行snakemake(v7.6.2),并且我注意到,与其“原理”不同,它正在尝试重新运行输出文件已经存在的管道的步骤。

在我的第一次运行中,我有以下dag:

成功完成,但我现在想向其添加另一个规则( quast_first ),如下所示, 'dag:

“在此处输入图像描述”

(我已经通过添加 quast_first 作为 Quast_second 的输入来做到这一点)

如果我调用干式运行,我d期望重新执行以下规则:

  1. quast_first :不存在输出,它不是以前的工作流程
  2. quast_second :尽管存在输出,但它具有一个新的依赖项( quast_first ),尽管对于这种特定情况,输出应该完全相同,因为 quast_first 的输出只是一个依赖项(因此不输入)对于 quast_second

,我看到Snakemake想要重新生成整个工作流程。以下是用调用干式运行的摘录 - 原因标志,如这个问题

rule symLinkFQ:
    input: logs/BORD1725, /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04
    output: symLinkFq/BORD1725
    log: /home/ngs/tempSnakemake/20220420Microbiology_q20/logs/BORD1725
    jobid: 34
    reason: Updated input files: /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04
    wildcards: barcode=BORD1725
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp

ln -s /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04 symLinkFq/BORD1725

但是,我可以确认规则的输出 symlinkfq 确实存在(workdir是/home /ngs/tempsnakemake/20220420microbiology_q20 ),

[ngs@vngs20x ~/tempSnakemake/20220420Microbiology_q20]$ ll symLinkFq/BORD1725
lrwxrwxrwx. 1 ngs ngs 125 24. Mai 14:01 symLinkFq/BORD1725 -> /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04

所以我不太了解:

reason: Updated input files: /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04`, as shown above:

执行整个工作流程:

Job stats:
job               count    min threads    max threads
--------------  -------  -------------  -------------
all                   1              1              1
cat_fastq             4              1              1
flye                  4              1              1
minimap_first         4              1              1
minimap_second        4              1              1
quast_first           4              1              1
quast_second          4              1              1
racon_first           4              1              1
racon_second          4              1              1
symLinkFQ             4              1              1
total                37              1              1

在干式运行结束时,再次表明,如果我称其为:我一直在 使用先前版本的snakemake(v.5。*主要是),据我所记得,这是我第一次遇到此问题(Snakemake重新运行了其输出文件已经存在的规则)。可以是,这与版本有关吗,例如,我现在必须将命令行参数传递给SnakeMake,告诉它不要重新生成已经存在的输出文件(尽管我总是希望这是默认行为)?

I am running snakemake (v7.6.2) and I noticed that, unlike its 'principles', it is attempting to re-run steps of a pipeline whose output files already exist.

In my first run I had the following DAG:

enter image description here

which finished successfully, but I now want to add another rule to it (quast_first), as shown in the following 'updated' DAG:

enter image description here

(I have done that by adding the output of quast_first as input for quast_second)

If I call a dry run, I'd be expecting the following rules to be re-executed:

  1. quast_first: output does not exist, it was not part of the previous workflow
  2. quast_second: although the output exists, it has a new dependency (quast_first), although, for this specific case, the output should be the exact same, as the output of quast_first is just a dependency (so no input) for quast_second

However, I see that snakemake wants to re-generate the whole workflow. Below is an extract from calling a dry run with the --reason flag, as explained in this question:

rule symLinkFQ:
    input: logs/BORD1725, /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04
    output: symLinkFq/BORD1725
    log: /home/ngs/tempSnakemake/20220420Microbiology_q20/logs/BORD1725
    jobid: 34
    reason: Updated input files: /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04
    wildcards: barcode=BORD1725
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp

ln -s /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04 symLinkFq/BORD1725

However, I can confirm that the output of the rule symLinkFQ does exist (workdir is /home/ngs/tempSnakemake/20220420Microbiology_q20),

[ngs@vngs20x ~/tempSnakemake/20220420Microbiology_q20]$ ll symLinkFq/BORD1725
lrwxrwxrwx. 1 ngs ngs 125 24. Mai 14:01 symLinkFq/BORD1725 -> /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04

so I don't quite understand what is meant by:

reason: Updated input files: /nexus/Gridion/20220420Microbiology_q20/no_sample/20220405_1846_X1_FAT23098_47b43b4a/High_accuracy_basecalling/pass/barcode04`, as shown above:

also at the end of the dry run it shows again that the whole workflow will be executed if I call it:

Job stats:
job               count    min threads    max threads
--------------  -------  -------------  -------------
all                   1              1              1
cat_fastq             4              1              1
flye                  4              1              1
minimap_first         4              1              1
minimap_second        4              1              1
quast_first           4              1              1
quast_second          4              1              1
racon_first           4              1              1
racon_second          4              1              1
symLinkFQ             4              1              1
total                37              1              1

I have been using previous versions of Snakemake (v.5.* mainly) and as far as I recall this is the first time that I encounter this issue (Snakemake re-running rules whose output files already exist). Can it be that this is version-related, that I now, for example, have to pass a command line argument to snakemake telling it not to re-generate output files that already exist (although I would always expect this to be the default behaviour)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

原来分手还会想你 2025-02-07 18:20:24

事实证明,这确实与版本有关。我删除了snakemake v7.6.2 v5.8.1 和snakemake不再想要重复存在其输出文件的规则。在V 7.8.0 上也存在相同的错误,这是最新版本

turns out that this is indeed version related. I removed snakemake v7.6.2 and isntalled v5.8.1 and snakemake no longer wants to repeat rules whose output files exist. Same bug is present at v7.8.0, which is the latest release version

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文