在snakemake中,使用一个未知变量,使用Expand()的最佳方法是什么?
我目前正在使用Snakemake进行生物信息学项目。给定人参考基因组(HG19)和一个BAM文件,我希望能够指定具有相同名称但不同扩展名的多个输出文件。这是我的代码
rule gridss_preprocess:
input:
ref=config['ref'],
bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
output:
expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample = "{sample}")
当前config ['workreq']是以“”开头的扩展名列表。
例如,我希望能够使用Expand来指示以下文件,
S1.dedup.downsampled.bam.cigar_metrics
S1.dedup.downsampled.bam.computesamtags.changes.tsv
S1.dedup.downsampled.bam.coverage.blacklist.bed
S1.dedup.downsampled.bam.idsv_metrics
我希望能够为多个示例文件进行此操作。目前,当我尝试进行干式运行时,我没有遇到错误。但是,我不确定这是否会正确运行。
我在做这件事吗?
I am currently using Snakemake for a bioinformatics project. Given a human reference genome (hg19) and a bam file, I want to be able to specify that there will be multiple output files with the same name but different extensions. Here is my code
rule gridss_preprocess:
input:
ref=config['ref'],
bam=config['bamdir'] + "{sample}.dedup.downsampled.bam",
bai=config['bamdir'] + "{sample}.dedup.downsampled.bam.bai"
output:
expand(config['bamdir'] + "{sample}.dedup.downsampled.bam{ext}", ext = config['workreq'], sample = "{sample}")
Currently config['workreq'] is a list of extensions that start with "."
For example, I want to be able to use expand to indicate the following files
S1.dedup.downsampled.bam.cigar_metrics
S1.dedup.downsampled.bam.computesamtags.changes.tsv
S1.dedup.downsampled.bam.coverage.blacklist.bed
S1.dedup.downsampled.bam.idsv_metrics
I want to be able to do this for multiple sample files, S_. Currently I am not getting an error when I try to do a dry run. However, I am not sure if this will run properly.
Am I doing this right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
展开()
定义了文件列表。如果您使用两个参数,将使用笛卡尔产品。因此,您的规则将定义为所有样本的扩展名列表输出所有文件。由于您在输入中定义了一个通配符,因此我认为您想要的是所有示例的扩展名的文件。此规则将与样本数量一样多次执行。您正在混合 wildcards >展开()函数。您可以通过加倍括号来定义Expand()内的通配符:
此扩展功能将在list
{sample} .dedup.downsampled.bam.cigar_metrics
{sample> {sample} .dedep.dedup.dedep.dedysmpled 中扩展。 。
,因此定义通配符
示例
以匹配输入中的文件。expand()
defines a list of files. If you're using two parameters, the cartesian product will be used. Thus, your rule will define as output ALL files with your extension list for ALL samples. Since you define a wildcard in your input, I think that what you want is all files with your extension for ONE sample. And this rule will be executed as many times as the number of samples.You're mixing up wildcards and placeholders for the
expand()
function. You can define a wildcard inside an expand() by doubling the brackets:This expand function will expand in list
{sample}.dedup.downsampled.bam.cigar_metrics
{sample}.dedup.downsampled.bam.computesamtags.changes.tsv
{sample}.dedup.downsampled.bam.coverage.blacklist.bed
{sample}.dedup.downsampled.bam.idsv_metrics
and thus define the wildcard
sample
to match the files in the input.