使用函数定义文件输入时出现 InputFunctionException

发布于 2025-01-15 10:47:19 字数 2028 浏览 3 评论 0原文

非常感谢您对 Snakemake 工作流程的帮助。我一直在使用函数为我的第一个 Snakemake 规则定义输入文件，从样本数据帧中识别配对的 fastq 文件。这效果很好。

# fastq1 input function definition
def fq1_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_1"]

# fastq2 input function definition
def fq2_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_2"]

# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
  
# Define a rule for running the complete pipeline. 
rule all:
  input:
    trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
               
# Trim reads for quality. 
rule trim_reads:  
  input: 
    p1=fq1_from_sample,
    p2=fq2_from_sample
  output:     
    trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
    trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
  log: 
    'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
  shell:
    '{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'

但是，当我在新规则中使用相同的输入函数时，要计算这些配对的 fastq 文件中的读取次数，如下所示：

rule reads_output:
  input:
    p1=fq1_from_sample,
    trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
    kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
  output:
    reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
  log:
    'results/{batch}/{samp}/stats/{samp}_read_counts.log'
  shell:    
    '''
    {config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
    '''

我遇到以下错误：

InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food

我想知道您是否对如何使用相同的函数有任何建议定义整个 Snakemake 管道的输入？

再次感谢您！

原文

Thank you very much for your help on a Snakemake workflow. I have been using functions to define input files for my first Snakemake rule, identifying paired fastq files from a samples dataframe. This works well.

# fastq1 input function definition
def fq1_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_1"]

# fastq2 input function definition
def fq2_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_2"]

# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
  
# Define a rule for running the complete pipeline. 
rule all:
  input:
    trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
               
# Trim reads for quality. 
rule trim_reads:  
  input: 
    p1=fq1_from_sample,
    p2=fq2_from_sample
  output:     
    trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
    trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
  log: 
    'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
  shell:
    '{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'

However, when I use the same input function in a new rule, to count the reads in these paired fastq files as here:

rule reads_output:
  input:
    p1=fq1_from_sample,
    trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
    kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
  output:
    reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
  log:
    'results/{batch}/{samp}/stats/{samp}_read_counts.log'
  shell:    
    '''
    {config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
    '''

I run into the following error:

InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food

I'm wondering if you have any suggestions on how to use the same function to define inputs throughout a Snakemake pipeline?

Thank you again and best!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜中书 2025-01-22 10:47:19

从新的规则定义来看，您似乎使用 samp 作为规则 reads_output 中的通配符关键字，而函数需要 sample。可能的解决方案包括：

更改规则reads_output中的通配符定义（这是首选解决方案，因为一致的通配符名称可以提高可读性）；

允许函数接受第二个参数，该参数标识要使用通配符中的哪个键。大致思路如下：

def fq1_from_sample(通配符, argument='sample'):
     条件 = 通配符.get(参数)
     返回samples_df.loc[条件，“fastq_1”]

# 注意相关规则应该传递第二个参数
＃ 例如
# 输入：lambda 通配符：fq1_from_sample(wildcards, argument='samp')

更危险的方法是使用 try/except 重新定义函数：

# 这是一个坏主意，但可能对调试有用
def fq1_from_sample(通配符):
     尝试：
         返回samples_df.loc[通配符.sample,“fastq_1”]
     除了：
         返回samples_df.loc[通配符.samp,“fastq_1”]

From the new rule definition, it seems you are using samp as the wildcard keyword in rule reads_output, while the function expects sample. The possible solutions include:

changing the wildcard definition in the rule reads_output (this is the preferred solution as consistent wildcard names improve readability);

allow functions to accept a second argument that identifies which key in the wildcards to use. The rough idea is below:

def fq1_from_sample(wildcards, argument='sample'):
     condition = wildcards.get(argument)
     return samples_df.loc[condition, "fastq_1"]

# note that the relevant rule should pass the second argument
# for example
#     input: lambda wildcards: fq1_from_sample(wildcards, argument='samp')

a more dangerous approach is to redefine the function with a try/except:

# THIS IS A BAD IDEA, BUT MIGHT BE USEFUL FOR DEBUGGING
def fq1_from_sample(wildcards):
     try:
         return samples_df.loc[wildcards.sample, "fastq_1"]
     except:
         return samples_df.loc[wildcards.samp, "fastq_1"]

回复收藏 0 原文

~没有更多了~