如何限制磁盘空间中的磁盘空间?
我使用8个配对端FASTQ文件,每个文件都有150 GB,需要通过带有空间按需子任务的管道来处理。我尝试了多个选项,但我仍在耗尽磁盘空间:
- 不再需要时使用临时删除输出文件,
- 使用了disk_mb资源来限制并行数量的并行作业。
我使用以下执行将我的磁盘空间使用限制为500GB,但显然不能保证,并且超过了500GB。如何将磁盘使用量限制为固定值以避免磁盘空间用完?
snakemake --resources disk_mb=500000 --use-conda --cores 16 -p
rule merge:
input:
fw="{sample}_1.fq.gz",
rv="{sample}_2.fq.gz",
output:
temp("{sample}.assembled.fastq")
resources:
disk_mb=100000
threads: 16
shell:
"""
merger-tool -f {input.fw} -r {input.rv} -o {output}
"""
rule filter:
input:
"{sample}.assembled.fastq"
output:
temp("{sample}.assembled.filtered.fastq")
resources:
disk_mb=100000
shell:
"""
filter-tool {input} {output}
"""
rule mapping:
input:
"{sample}.assembled.filtered.fastq"
output:
"{sample}_mapping_table.txt"
resources:
disk_mb=100000
shell:
"""
mapping-tool {input} {output}
"""
I work with 8 paired-end fastq files with 150 GB each, which need to be processed by a pipeline with space-demanding sub-tasks. I tried several options but I am still running out out disk space:
- used temp to delete output files when not needed anymore
- used disk_mb resources to limit number of parallel jobs.
I use the following execution to limit my disk space usage to 500GB, but apparently this is not guaranteed and exceeds the 500GB. How to limit the disk usage to a fixed value to avoid running out of disk space ?
snakemake --resources disk_mb=500000 --use-conda --cores 16 -p
rule merge:
input:
fw="{sample}_1.fq.gz",
rv="{sample}_2.fq.gz",
output:
temp("{sample}.assembled.fastq")
resources:
disk_mb=100000
threads: 16
shell:
"""
merger-tool -f {input.fw} -r {input.rv} -o {output}
"""
rule filter:
input:
"{sample}.assembled.fastq"
output:
temp("{sample}.assembled.filtered.fastq")
resources:
disk_mb=100000
shell:
"""
filter-tool {input} {output}
"""
rule mapping:
input:
"{sample}.assembled.filtered.fastq"
output:
"{sample}_mapping_table.txt"
resources:
disk_mb=100000
shell:
"""
mapping-tool {input} {output}
"""
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
snakemake
没有限制资源的功能,而只能以尊重资源约束的方式安排作业。现在,
snakemake
使用Resources
限制并发作业,而您的问题具有累积的方面。看看这个答案,解决此问题的一种方法是引入PRIFISTICY
,所以下游任务的优先级最高。在您的特定文件中,似乎将
优先级
添加到映射
规则应该足够:您可能还要谨慎启动规则(避免填写该规则带有
合并
的结果的磁盘空间。Snakemake
does not have the functionality to constrain resources, but can only schedule jobs in a way that respects resource constraints.Now,
snakemake
usesresources
to limit concurrent jobs, while your problem has a cumulative aspect to it. Taking a look at this answer, one way to resolve this is to introducepriority
, so that downstream tasks have highest priority.In your particular file, it seems that adding
priority
to themapping
rule should be sufficient:You might also want to be careful about launching the rule initially (to avoid filling up the disk space with results of
merge
).