Snakemake规则日志/基准通配符在检查点之后与输出通配符不匹配

发布于 2025-02-12 02:34:18 字数 1757 浏览 1 评论 0原文

我正在运行一个带有检查站的Snakemake工作流程,在某个时候我收集了以前未知数的输出文件。然后,SnakeMake应使用下一个规则的文件编号创建许多任务,使用收集的检查点文件作为该规则通配符的通配符。一切都很好,除非我希望该规则也可以创建日志和/或基准文件,否则抛出了这一点:

SyntaxError:
Not all output, log and benchmark files of rule plasmid_spades contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
  File "/path/to/Snakefile", line N, in <module>

这些是工作流程的相关部分:

WCS = None

...

def gather_checkpoint_output(wildcards):
    ck_output = checkpoints.checkpoint_rule.get(**wildcards).output[0]
    global WCS
    WCS, = glob_wildcards(os.path.join(ck_output, "{wc}", "{wc}.file"))
    return expand(os.path.join(ck_output, "{wc}", "{wc}.file"), wc=WCS)


def gather_some_rule_after_checkpoint_out(wildcards):
    rule_output = checkpoints.checkpoint_rule.get(**wildcards).output[0]
    WCS2, = glob_wildcards(os.path.join(rule_output, "{wc}", "{wc}.file"))
    return expand(os.path.join("some", "{wc}", "path", "output.file"), wc=WCS2)

...

localrules: all
rule all:
    input:
        gather_checkpoint_output,
        gather_some_rule_after_checkpoint_out

...

rule some_rule_after_checkpoint:
    input:
        input = gather_checkpoint_output
    output:
        out_dir = directory(expand(os.path.join("some", "{wc}", "dir"), wc=WCS)),
        output = expand(os.path.join("some", "{wc}", "path", "output.file"), wc=WCS)
    log:
         os.path.join("logs", "some", "path", "{wc}_rule.log")
    benchmark:
         os.path.join("logs", "some", "path", "{wc}_rule_benchmark.tsv")
...

是问题所在,它在开始时评估日志/基准标记通配符(WCS =无),而输出将通过检查点函数重新评估?尽管我认为,规则通配符是基于输出通配符的。我尝试了lambda函数,Expand()等,以特别从(希望重新评估的WCS)获得日志中的通配符,但这显然不允许。我在这里忽略了一些明显的东西,还是整个结构以某种方式错误?

I am running a Snakemake workflow with a checkpoint at some point from which I gather the previously unknown number of output files. Snakemake should then create a number of tasks based on the file number with the next rule, using some part of the gathered checkpoints files as wildcards for that rules wildcards. It all works fine, unless I want that rule to also create log and/or benchmark files, at which point is throws:

SyntaxError:
Not all output, log and benchmark files of rule plasmid_spades contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
  File "/path/to/Snakefile", line N, in <module>

These are the relevant parts of the workflow:

WCS = None

...

def gather_checkpoint_output(wildcards):
    ck_output = checkpoints.checkpoint_rule.get(**wildcards).output[0]
    global WCS
    WCS, = glob_wildcards(os.path.join(ck_output, "{wc}", "{wc}.file"))
    return expand(os.path.join(ck_output, "{wc}", "{wc}.file"), wc=WCS)


def gather_some_rule_after_checkpoint_out(wildcards):
    rule_output = checkpoints.checkpoint_rule.get(**wildcards).output[0]
    WCS2, = glob_wildcards(os.path.join(rule_output, "{wc}", "{wc}.file"))
    return expand(os.path.join("some", "{wc}", "path", "output.file"), wc=WCS2)

...

localrules: all
rule all:
    input:
        gather_checkpoint_output,
        gather_some_rule_after_checkpoint_out

...

rule some_rule_after_checkpoint:
    input:
        input = gather_checkpoint_output
    output:
        out_dir = directory(expand(os.path.join("some", "{wc}", "dir"), wc=WCS)),
        output = expand(os.path.join("some", "{wc}", "path", "output.file"), wc=WCS)
    log:
         os.path.join("logs", "some", "path", "{wc}_rule.log")
    benchmark:
         os.path.join("logs", "some", "path", "{wc}_rule_benchmark.tsv")
...

Is the problem, that it evaluates the logs/benchmarks wildcard in the beginning (WCS = None), while the output will be reevaluated with the checkpoint functions? Although, a rules wildcards are based off of the outputs wildcards, I think. I tried lambda functions, expand(), etc., to specifically get the wildcards from (the hopefully reevaluated WCS) for the logs, but that is apparently not permitted. Am I overlookig something obvious here or is the entire construction wrong somehow?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ゃ人海孤独症 2025-02-19 02:34:18

您的问题可能是臭名昭著且非常常见的“对扩展”一般问题的化身。

中,some_rule_after_checkpoint规则log基准标准指令包含通配符,而utputs>输出指令则没有。实际上,您需要很好地意识到输出文件名模式中的通配符扩展,从而导致了完全解决的文件名。

这使SnakeMake感到困惑:通配符的值应使用什么值来确定log基准标准文件的名称,如果output 文件名?规则中的通配符值是通过将本规则的输出文件名模式与下游规则中完全分辨的输入文件名匹配来确定的。

您可能不应在some_rule_after_checkpoint的输出中使用展开,因为扩展已在all规则的输入中完成。

使用some_rule_after_checkpoint中的非扩展输出文件名模式,规则ALL在扩展的输入中的每个不同文件都会触发some_rule_after_checkpoint 规则,将根据some_rule_after_checkpoint中的输出文件模式与所需的“完全解析”输入的所有输入中的输出文件模式之间的模式匹配确定其值。。然后,此通配符将使SnakeMake能够生成相应的日志基准文件。

Your issue might be an avatar of the infamous and very common "wrong use of expand" general problem.

In the some_rule_after_checkpoint rule the log and benchmark directives contain a wildcard, while the output directive doesn't. Indeed, you need to be well aware of the fact that the wildcards in the output file name patterns are expanded, resulting in a list of fully resolved file names.

This confuses Snakemake: What value of the wildcard should it use to determine the names of the log and benchmark files if there is no wildcard in the output file names? Wildcard values in a rule are determined by matching this rule's output file name patterns with fully-resolved input file names in a downstream rule.

You should likely not use expand in the outputs of some_rule_after_checkpoint, since the expanding is already done in the input of the all rule.

With non-expanded output file name patterns in some_rule_after_checkpoint, each different file in the expanded input of rule all will trigger one instance of the some_rule_after_checkpoint rule, for which the value of the wildcard will be determined based on a pattern matching between the output file patterns in some_rule_after_checkpoint and the desired "fully resolved" input for all. This wildcard will then enable Snakemake to generate the corresponding log and benchmark file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文