根据字段数量连接列

发布于 2025-01-14 05:10:24 字数 1680 浏览 1 评论 0原文

我有一个大型工作流程，被未表征的染色体所困扰 - 一个过程生成一个计数矩阵，其中包含典型染色体的 n 字段，对于具有未表征染色体的品系，字段为 n + 1 和n + 2。这对于下游使用 read.table() 来说是一个令人头痛的问题。

我的方法是首先确定 n 是什么，并用它来分离包含这些未表征染色体的 n + 1 和 n + 2 系：

awk -v nf="$canon" 'NF!=nf{print}{}' matrix.txt | head

chr22   KI270733v1  random  123189  123362  +   6   4   8   0   0   10
chrUn   GL000220v1  105951  106963  -   0   0   0   0   10  0

这些行的目标是通过连接第一列和第二列（其中 n + 1）以及第一列、第二列和第三列（其中 ）来匹配字段数 n n+ 2 生成：

chrUn-GL000220v1    105951  106963  -   0   0   0   0   10  0
chr22-KI270733v1-random 123189  123362  +   6   4   8   0   0   10

尝试

对矩阵进行子集化并将其分成 3 个文件，一个用于 NF==n、NF==n+1 和 NF==n。 NF==n+2 并加入列：

awk -v n="$canon" 'NF==n{print}{}' matrix.txt | head

chr1    15534236    15536814    -   0   10  0   0   0   3

（^ 无需执行任何操作）

awk -v n="$canon" 'NF==n+1{print}{}' matrix.txt | awk -v OFS="\t" '{print $1"-"$2,$3,$4,$5,$6,$7,$8,$9,$10}' | head

chrUn-GL000220v1    105992  107309  -   0   0   0   0   0   4

，

awk -v n="$canon" 'NF==n+2{print}{}' matrix.txt | awk -v OFS="\t" '{print $1"-"$2"-"$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | head

chr22-KI270733v1-random 123189  123362  +   6   4   8   0   0   10

不幸的是，这个解决方案不是动态的 - 我必须指定列的范围。在前四列详细说明 Chr、Start、Stop、Strand 之后，工作流可以包含任意数量的列。

希望我已经很好地定义了问题，任何建议将不胜感激。

原文

I have a large workflow that gets tripped up by uncharacterized chromosomes - a process produces a count matrix that has n fields for canonical chromosomes, and for lines with uncharacterized chromosomes, the fields are n + 1 and n + 2. This is a headache for using read.table() downstream.

My approach is to first identify what n is, and use this to isolate the n + 1 and n + 2 lines containing these uncharacterized chromosomes:

awk -v nf="$canon" 'NF!=nf{print}{}' matrix.txt | head

chr22   KI270733v1  random  123189  123362  +   6   4   8   0   0   10
chrUn   GL000220v1  105951  106963  -   0   0   0   0   10  0

The goal is for these lines to match the number of fields n by joining the 1st and 2nd columns where n + 1 and the 1st, 2nd and 3rd columns where n + 2 to produce:

chrUn-GL000220v1    105951  106963  -   0   0   0   0   10  0
chr22-KI270733v1-random 123189  123362  +   6   4   8   0   0   10

Attempt

I could subset the matrix and split it into 3 files, one for NF==n, NF==n+1 & NF==n+2 and join the columns:

awk -v n="$canon" 'NF==n{print}{}' matrix.txt | head

chr1    15534236    15536814    -   0   10  0   0   0   3

(^ no action needed)

awk -v n="$canon" 'NF==n+1{print}{}' matrix.txt | awk -v OFS="\t" '{print $1"-"$2,$3,$4,$5,$6,$7,$8,$9,$10}' | head

chrUn-GL000220v1    105992  107309  -   0   0   0   0   0   4

and

awk -v n="$canon" 'NF==n+2{print}{}' matrix.txt | awk -v OFS="\t" '{print $1"-"$2"-"$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | head

chr22-KI270733v1-random 123189  123362  +   6   4   8   0   0   10

Unfortunately, this solution is not dynamic - I have to specify the range of columns. The workflow could contain any number of columns after the first four detailing Chr, Start, Stop, Strand.

Hopefully I have defined the problem well, any suggestions would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

似最初 2025-01-21 05:10:24

尝试：

awk -v n=13 '{ for (i = 2; i <= NF - n + 1; ++i) { $1 = $1"-"$i; $i=""; } } 1'

累积到 $1 并清除其余的 $i=""。

您还可以将值向左移动 if (NF != n) for (i = 2; i < NF; ++i) $i=$(i+(NF-n)) 值并设置NF=n。

Try:

awk -v n=13 '{ for (i = 2; i <= NF - n + 1; ++i) { $1 = $1"-"$i; $i=""; } } 1'

Accumulate into $1 and clean $i="" the rest.

You could also move values to the left if (NF != n) for (i = 2; i < NF; ++i) $i=$(i+(NF-n)) values and set NF=n.

回复收藏 0 原文

~没有更多了~

关于作者

晌融

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

根据字段数量连接列

尝试

Attempt

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

根据字段数量连接列

尝试

Attempt

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。