使用函数定义文件输入时出现 InputFunctionException
非常感谢您对 Snakemake 工作流程的帮助。我一直在使用函数为我的第一个 Snakemake 规则定义输入文件,从样本数据帧中识别配对的 fastq 文件。这效果很好。
# fastq1 input function definition
def fq1_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_1"]
# fastq2 input function definition
def fq2_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_2"]
# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
# Define a rule for running the complete pipeline.
rule all:
input:
trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
# Trim reads for quality.
rule trim_reads:
input:
p1=fq1_from_sample,
p2=fq2_from_sample
output:
trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
log:
'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
shell:
'{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
但是,当我在新规则中使用相同的输入函数时,要计算这些配对的 fastq 文件中的读取次数,如下所示:
rule reads_output:
input:
p1=fq1_from_sample,
trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
output:
reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
log:
'results/{batch}/{samp}/stats/{samp}_read_counts.log'
shell:
'''
{config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
'''
我遇到以下错误:
InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food
我想知道您是否对如何使用相同的函数有任何建议定义整个 Snakemake 管道的输入?
再次感谢您!
Thank you very much for your help on a Snakemake workflow. I have been using functions to define input files for my first Snakemake rule, identifying paired fastq files from a samples dataframe. This works well.
# fastq1 input function definition
def fq1_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_1"]
# fastq2 input function definition
def fq2_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_2"]
# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
# Define a rule for running the complete pipeline.
rule all:
input:
trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
# Trim reads for quality.
rule trim_reads:
input:
p1=fq1_from_sample,
p2=fq2_from_sample
output:
trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
log:
'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
shell:
'{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
However, when I use the same input function in a new rule, to count the reads in these paired fastq files as here:
rule reads_output:
input:
p1=fq1_from_sample,
trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
output:
reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
log:
'results/{batch}/{samp}/stats/{samp}_read_counts.log'
shell:
'''
{config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
'''
I run into the following error:
InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food
I'm wondering if you have any suggestions on how to use the same function to define inputs throughout a Snakemake pipeline?
Thank you again and best!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从新的规则定义来看,您似乎使用
samp
作为规则reads_output
中的通配符关键字,而函数需要sample
。可能的解决方案包括:更改规则
reads_output
中的通配符定义(这是首选解决方案,因为一致的通配符名称可以提高可读性);允许函数接受第二个参数,该参数标识要使用通配符中的哪个键。大致思路如下:
更危险的方法是使用
try
/except
重新定义函数:From the new rule definition, it seems you are using
samp
as the wildcard keyword in rulereads_output
, while the function expectssample
. The possible solutions include:changing the wildcard definition in the rule
reads_output
(this is the preferred solution as consistent wildcard names improve readability);allow functions to accept a second argument that identifies which key in the wildcards to use. The rough idea is below:
a more dangerous approach is to redefine the function with a
try
/except
: