当之前的规则有一些输出失败时执行下游 Snakemake 规则
在规则脚本中,我有一些示例失败,但大多数都通过了。我希望 Snakemake 看到这些失败并继续使用下游规则 build_script_table。我不太确定该怎么做。对此的任何帮助将不胜感激。目前我有一个粗略的 .py 脚本来处理这个问题,但如果可能的话希望自动化它。
rule script:
input: input_files
output:
'script_out/{sampleID}/{sampleID}.out.tsv'
threads: 8
params:
Toys = config['Toys_dir'],
db = config['Toys_db'],
run:
shell('export PATH={params.Toys}/samtools-0.1.19:$PATH; \
rm -r script_out/{wildcards.sampleID}; \
{params.Toys}/Toys.pl \
-name {wildcards.sampleID} \
-o script_out/{wildcards.sampleID} \
-db {params.db} \
-p {threads} \
{input}')
rule script_copy:
input: rules.script.output
output: 'script_calls/{sampleID}_out_filtered.tsv'
run:
shell('cp {input} {output}')
rule build_script_table:
input: expand('script_calls/{sampleID}_out_filtered.tsv', sampleID=sampleIDs)
output: 'tables/all_script.txt'
params:
span = config['length'],
run:
dfs = []
for fname in input:
df = pandas.read_csv(fname, sep='\t')
if len(df) > 0:
df['sampleID'] = fname.split('/')[-1].split('_')[0]
df['Toyscript'] = 1
df['Match'] = df.apply(lambda row: sorted_Match(row['ToyName1'], row['ToyName2']), axis=1)
df['supporting_prices'] = df.spanningdates
df['total_price'] = df['supporting_prices'].groupby(df['Match']).transform('sum') # combine fusions that are A|B and B|A
df.drop_duplicates('Match', inplace=True) # only keep the first row of each fusion now that support reads are summed
df = df[df['total_price'] >= params.length] # remove fusions with too few supporting reads
scores = list(range(1, len(df) + 1))
scores.reverse() # you want the fusions with the most reads getting the highest score
df.sort_values(by=['total_price'], ascending=False, inplace=True)
df['script_rank'] = scores
df['script_score'] = df['script_rank'].apply(lambda x: float(x)/len(df)) # percent scores for each fusion with 1 being top fusion
dfs.append(df)
dfsc = pandas.concat(dfs)
dfsc.to_csv(output[0], sep='\t', index=False)
In rule Script, I have a few samples that fail out, but the majority pass. I would like Snakemake to see these failures and continue with the downstream rule rule build_script_table. I am not really sure how to do this. Any help on this would be much appreciated. Currently I have a crude .py script that handles this, but want to automate this if possible.
rule script:
input: input_files
output:
'script_out/{sampleID}/{sampleID}.out.tsv'
threads: 8
params:
Toys = config['Toys_dir'],
db = config['Toys_db'],
run:
shell('export PATH={params.Toys}/samtools-0.1.19:$PATH; \
rm -r script_out/{wildcards.sampleID}; \
{params.Toys}/Toys.pl \
-name {wildcards.sampleID} \
-o script_out/{wildcards.sampleID} \
-db {params.db} \
-p {threads} \
{input}')
rule script_copy:
input: rules.script.output
output: 'script_calls/{sampleID}_out_filtered.tsv'
run:
shell('cp {input} {output}')
rule build_script_table:
input: expand('script_calls/{sampleID}_out_filtered.tsv', sampleID=sampleIDs)
output: 'tables/all_script.txt'
params:
span = config['length'],
run:
dfs = []
for fname in input:
df = pandas.read_csv(fname, sep='\t')
if len(df) > 0:
df['sampleID'] = fname.split('/')[-1].split('_')[0]
df['Toyscript'] = 1
df['Match'] = df.apply(lambda row: sorted_Match(row['ToyName1'], row['ToyName2']), axis=1)
df['supporting_prices'] = df.spanningdates
df['total_price'] = df['supporting_prices'].groupby(df['Match']).transform('sum') # combine fusions that are A|B and B|A
df.drop_duplicates('Match', inplace=True) # only keep the first row of each fusion now that support reads are summed
df = df[df['total_price'] >= params.length] # remove fusions with too few supporting reads
scores = list(range(1, len(df) + 1))
scores.reverse() # you want the fusions with the most reads getting the highest score
df.sort_values(by=['total_price'], ascending=False, inplace=True)
df['script_rank'] = scores
df['script_score'] = df['script_rank'].apply(lambda x: float(x)/len(df)) # percent scores for each fusion with 1 being top fusion
dfs.append(df)
dfsc = pandas.concat(dfs)
dfsc.to_csv(output[0], sep='\t', index=False)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
也许不再优雅,但我认为你可以将你的工作流程分成两半并调用snakemake两次。
要运行,您需要使用 --keep-going 标志执行部分蛇文件,一旦完成,您就执行其余部分:
我认为正确的方法是捕获“失败”发生在脚本规则中,而不是抛出
exit 1
,而是创建一个空文件,并在 build_table 规则中单独处理。Perhaps not any more elegant, but I think you could split your workflow in half and invoke snakemake twice.
To run, you execute the partial snakefile with the --keep-going flag and once that is done you execute the rest:
I think the right way to go about it is to catch the "fail out" happening in the script rule and instead of throwing an
exit 1
you create an empty file that you handle separately in the build_table rule.看看这是否有帮助。正如我在评论中所说,编写一个虚拟文件,以防出现可接受的失败,并使用它来决定要做什么:
See if this helps. As I say in comments, write a dummy file in case of acceptable failure and use it downstrean to decide what to do: