NextFlow:将FREF FREFEPAIRS的输入转换为元组(MAP,LIST_PAIR_1,LIST_PAIR_2)
在我的NextFlow工作流中,我需要处理类似于下面示例的文件。
a.vcf.gz
a.vcf.gz.tbi
b.vcf.gz
b.vcf.gz.tbi
c.vcf.gz
c.vcf.gz.tbi
特别是,我需要创建一个通道,该频道将使用此结构输出它们:
[
["id": "test"],
["a.vcf.gz", "b.vcf.gz", "c.vcf.gz"],
["a.vcf.gz.tbi", "b.vcf.gz.tbi", "c.vcf.gz.tbi"]
]
这意味着一个单个地图的元组,一个元组*。VCF.GZ
文件和一个元组** .vcf.gz.tbi
文件。
我的问题是,从我对文档的阅读来看,如何从依次以三组组为单位发射项目的频道来创建它并不明显。
为简单起见,我使用channel.fromfilepairs
从对收集文件:
ch_input = Channel
.fromFilePairs("*{.vcf.gz,.vcf.gz.tbi}")
这就是我被卡住的地方。我获得的最接近是从filepairs 中取得和使用
grouptuple
:
ch_input = Channel
.fromPath("*.vcf.gz*")
.map {
file ->
def fmeta = ["id": "test"]
value = file.extension == "gz" ? "vcf": "tbi"
[value, file]
}.groupTuple()
println ch_input.view()
哪个给出:
[tbi, [/Users/einar/Coding/a.vcf.gz.tbi, /Users/einar/Coding/c.vcf.gz.tbi, /Users/einar/Coding/einar/b.vcf.gz.tbi]]
[vcf, [/Users/einar/Coding/b.vcf.gz, /Users/einar/Coding/a.vcf.gz, /Users/einar/Coding/c.vcf.gz]]
它仍然远离我想要的东西和更脆弱,因为它依赖于文件扩展。
channel.multimap
靠近我想要的东西,但是它会生成多个频道,而我需要一个频道。
如何正确地完成?
编辑:
这是另一个尝试,它得到了我想要的东西,但是它看起来对我有点脆弱:
ch_input = Channel
.fromPath("*.vcf*")
.map{
file ->
[file.extension, file]
}.groupTuple()
.map {
it ->
def fmeta = ["id": "test"]
[fmeta, it[1].flatten()]
}.groupTuple()
.map{
it ->
[it[0], it[1][0], it[1][1]]
}
println ch_input.view()
In my Nextflow workflow, I need to process files similar to the below example.
a.vcf.gz
a.vcf.gz.tbi
b.vcf.gz
b.vcf.gz.tbi
c.vcf.gz
c.vcf.gz.tbi
In particular, I need to create a channel which will output them with this structure:
[
["id": "test"],
["a.vcf.gz", "b.vcf.gz", "c.vcf.gz"],
["a.vcf.gz.tbi", "b.vcf.gz.tbi", "c.vcf.gz.tbi"]
]
This means a tuple of a single map, one tuple of *.vcf.gz
files and one tuple of *.vcf.gz.tbi
files.
My problem is that, from my reading of the documentation, it's not evident how to create it from a channel that emits items sequentially in groups of three.
For simplicity, I collect the files from pairs using Channel.fromFilePairs
:
ch_input = Channel
.fromFilePairs("*{.vcf.gz,.vcf.gz.tbi}")
This is where I got stuck. The closest I've got was by scrapping fromFilePairs
and using groupTuple
:
ch_input = Channel
.fromPath("*.vcf.gz*")
.map {
file ->
def fmeta = ["id": "test"]
value = file.extension == "gz" ? "vcf": "tbi"
[value, file]
}.groupTuple()
println ch_input.view()
Which gives:
[tbi, [/Users/einar/Coding/a.vcf.gz.tbi, /Users/einar/Coding/c.vcf.gz.tbi, /Users/einar/Coding/einar/b.vcf.gz.tbi]]
[vcf, [/Users/einar/Coding/b.vcf.gz, /Users/einar/Coding/a.vcf.gz, /Users/einar/Coding/c.vcf.gz]]
Which still is far away from what I'd like and more fragile because it relies on file extensions.
Channel.multiMap
is close to what I want, however it generates multiple channels, while instead I need a single channel.
How can this be done properly?
EDIT:
This is another attempt, which gets what I want, however it looks kind of hacky and fragile to me:
ch_input = Channel
.fromPath("*.vcf*")
.map{
file ->
[file.extension, file]
}.groupTuple()
.map {
it ->
def fmeta = ["id": "test"]
[fmeta, it[1].flatten()]
}.groupTuple()
.map{
it ->
[it[0], it[1][0], it[1][1]]
}
println ch_input.view()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要获得想要的东西,您需要 collect collect 操作员 value channge
给您一个 “ noreflow noreferrer”> 详细信息,但通常您不需要将索引文件与实际VCF文件分开。如果要直接将此通道用作过程输入,我的首选是更改输入声明,以便我可以使用类似的内容:
To get what you want, you'd need the collect operator which gives you a value channel:
It's difficult to say without the details, but usually you don't need to separate out the index files from the actual VCF files. If this channel is to be used directly as process input, my preference would be to alter the input declaration so that I could use something like this instead: