使用函数定义文件输入时出现 InputFunctionException

发布于 2025-01-15 10:47:19 字数 2028 浏览 3 评论 0原文

非常感谢您对 Snakemake 工作流程的帮助。我一直在使用函数为我的第一个 Snakemake 规则定义输入文件,从样本数据帧中识别配对的 fastq 文件。这效果很好。

# fastq1 input function definition
def fq1_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_1"]

# fastq2 input function definition
def fq2_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_2"]

# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
  
# Define a rule for running the complete pipeline. 
rule all:
  input:
    trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
               
# Trim reads for quality. 
rule trim_reads:  
  input: 
    p1=fq1_from_sample,
    p2=fq2_from_sample
  output:     
    trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
    trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
  log: 
    'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
  shell:
    '{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'

但是,当我在新规则中使用相同的输入函数时,要计算这些配对的 fastq 文件中的读取次数,如下所示:

rule reads_output:
  input:
    p1=fq1_from_sample,
    trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
    kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
  output:
    reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
  log:
    'results/{batch}/{samp}/stats/{samp}_read_counts.log'
  shell:    
    '''
    {config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
    ''' 

我遇到以下错误:

InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food

我想知道您是否对如何使用相同的函数有任何建议定义整个 Snakemake 管道的输入?

再次感谢您!

Thank you very much for your help on a Snakemake workflow. I have been using functions to define input files for my first Snakemake rule, identifying paired fastq files from a samples dataframe. This works well.

# fastq1 input function definition
def fq1_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_1"]

# fastq2 input function definition
def fq2_from_sample(wildcards):
  return samples_df.loc[wildcards.sample, "fastq_2"]

# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
  
# Define a rule for running the complete pipeline. 
rule all:
  input:
    trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names)
...)
               
# Trim reads for quality. 
rule trim_reads:  
  input: 
    p1=fq1_from_sample,
    p2=fq2_from_sample
  output:     
    trim1=temp('results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz'),
    trim2=temp('results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz')
  log: 
    'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
  shell:
    '{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'

However, when I use the same input function in a new rule, to count the reads in these paired fastq files as here:

rule reads_output:
  input:
    p1=fq1_from_sample,
    trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
    kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz'
  output:
    reads_stats='results/{batch}/{samp}/stats/{samp}_read_counts.txt'
  log:
    'results/{batch}/{samp}/stats/{samp}_read_counts.log'
  shell:    
    '''
    {config[scripts_dir]}reads_output.sh {input.p1} {input.p2} {input.kr1} {output.reads_stats} &>> {log}
    ''' 

I run into the following error:

InputFunctionException in line 91 of /oak/stanford/scg/lab_jandr/walter/tb/mtb_tgen/workflow/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
batch=MT01_MtB_Baits-2021-09-17
samp=10561-Food

I'm wondering if you have any suggestions on how to use the same function to define inputs throughout a Snakemake pipeline?

Thank you again and best!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甜中书 2025-01-22 10:47:19

从新的规则定义来看,您似乎使用 samp 作为规则 reads_output 中的通配符关键字,而函数需要 sample。可能的解决方案包括:

  • 更改规则reads_output中的通配符定义(这是首选解决方案,因为一致的通配符名称可以提高可读性);

  • 允许函数接受第二个参数,该参数标识要使用通配符中的哪个键。大致思路如下:

    def fq1_from_sample(通配符, argument='sample'):
         条件 = 通配符.get(参数)
         返回samples_df.loc[条件,“fastq_1”]
    
    # 注意相关规则应该传递第二个参数
    # 例如
    # 输入:lambda 通配符:fq1_from_sample(wildcards, argument='samp')
    
  • 更危险的方法是使用 try/except 重新定义函数:

    # 这是一个坏主意,但可能对调试有用
    def fq1_from_sample(通配符):
         尝试:
             返回samples_df.loc[通配符.sample,“fastq_1”]
         除了:
             返回samples_df.loc[通配符.samp,“fastq_1”]
    

From the new rule definition, it seems you are using samp as the wildcard keyword in rule reads_output, while the function expects sample. The possible solutions include:

  • changing the wildcard definition in the rule reads_output (this is the preferred solution as consistent wildcard names improve readability);

  • allow functions to accept a second argument that identifies which key in the wildcards to use. The rough idea is below:

    def fq1_from_sample(wildcards, argument='sample'):
         condition = wildcards.get(argument)
         return samples_df.loc[condition, "fastq_1"]
    
    # note that the relevant rule should pass the second argument
    # for example
    #     input: lambda wildcards: fq1_from_sample(wildcards, argument='samp')
    
  • a more dangerous approach is to redefine the function with a try/except:

    # THIS IS A BAD IDEA, BUT MIGHT BE USEFUL FOR DEBUGGING
    def fq1_from_sample(wildcards):
         try:
             return samples_df.loc[wildcards.sample, "fastq_1"]
         except:
             return samples_df.loc[wildcards.samp, "fastq_1"]
    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文