每行列数不一致的文件所需的转置类型

发布于 2024-11-16 08:01:31 字数 273 浏览 0 评论 0原文

我有一个制表符分隔的文件（其中每行的列数不固定），如下所示：

chr1 92536437 92537640 NM_024813 NM_053274

我想按以下顺序从中获得一个文件（前三列是我在拆分时需要的标识符）< br>

chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274

对 shell 脚本的建议。

原文

I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:

chr1 92536437 92537640 NM_024813 NM_053274

I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)

chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274

Suggestions for a shell script.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

韶华倾负 2024-11-23 08:01:31

#!/bin/bash
{
    IFS='   '
    while read a b c rest
    do
        for fld in $rest
        do
            echo -e "$a\t$b\t$c\t$fld"
        done
    done
}

请注意，您应该在那里输入一个真正的选项卡（IFS）

我还认为我应该做一个 perl 版本：

#!/bin/perl -n
($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r

要从命令行完成所有操作，从 in.txt 读取并输出到 out.txt

perl -ne '($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r' in.txt > out.txt

：当然，如果您保存 perl 脚本（例如 script.pl）

perl script.pl in.txt > out.txt

如果您还使脚本文件可执行（chmod +x script.pl）：

./script.pl in.txt > out.txt

HTH

#!/bin/bash
{
    IFS='   '
    while read a b c rest
    do
        for fld in $rest
        do
            echo -e "$a\t$b\t$c\t$fld"
        done
    done
}

Note that you should enter a real tab there (IFS)

I also thought I should do a perl version:

#!/bin/perl -n
($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r

To do it all from the commandline, reading from in.txt and outputting to out.txt:

perl -ne '($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r' in.txt > out.txt

Of course if you save the perl script (say as script.pl)

perl script.pl in.txt > out.txt

If you also make the script file executable (chmod +x script.pl):

./script.pl in.txt > out.txt

HTH

回复收藏 0 原文

心舞飞扬 2024-11-23 08:01:31

不是 shell，另一个答案完全没问题，但我用 perl 对其进行了编辑：

perl -F'/\s/' -lane '$,="\t"; print @F,$_ for splice @F,3' $FILE

编辑：新版本（甚至更难以阅读；），受到其他答案的启发。滥用 perl 的命令行参数和特殊变量进行自动分割和行结束处理。

含义：对于前三个字段之后的每个字段（对于拼接@F,3），打印前三个字段及其（print @F,$_）。

-F 将字段分隔符设置为 \s（应为 \t），以便 -a 自动拆分为 >@F。

-l 打开 -n 的行结束处理，它为输入的每一行运行 -e 代码。

$, 是输出字段分隔符。

Not shell, and the other answer is perfectly fine, but i onelined it in perl :

perl -F'/\s/' -lane '$,="\t"; print @F,$_ for splice @F,3' $FILE

Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.

Means: For each of the fields after the three first (for splice @F,3), print the first three and it (print @F,$_).

-F sets the field separator to \s (should be \t) for -a autosplitting into @F.

-l turns on line ending handling for -n which runs the -e code for each line of the input.

$, is the output field separator.

回复收藏 0 原文

囚我心虐我身 2024-11-23 08:01:31

[编辑]

那么您想为每个剩余项目复制前三列吗？

$ cat File | while read X
      do PRE=$(echo "$X" | cut -f1-3 -d ' ')
      for Y in $(echo "$X" | cut -f4- -d ' ')
          do echo $PRE $Y >> OutputFilename
      done
  done

chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR

这会将前三个空格分隔列作为前缀，然后滥用 for 循环将逐步遍历空格分隔列表来调用 echo 的事实。

享受。

[Edited]

So you want to duplicate the first three columns for each remaining item?

$ cat File | while read X
      do PRE=$(echo "$X" | cut -f1-3 -d ' ')
      for Y in $(echo "$X" | cut -f4- -d ' ')
          do echo $PRE $Y >> OutputFilename
      done
  done

Returns:

chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR

This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.

Enjoy.

回复收藏 0 原文

顾冷 2024-11-23 08:01:31

这只是您的两个文件中的数据比较问题的子集。

从那里提取我的有点老套的解决方案：

for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'

This is just a subset of your data comparison in two files question.

Extracting my slightly hacky solution from there:

for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'

回复收藏 0 原文

~没有更多了~

关于作者

柠北森屋

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

每行列数不一致的文件所需的转置类型

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

末蓝

年少掌心

党海生

飞翔的企鹅

鹿港小镇

wookoon

友情链接

每行列数不一致的文件所需的转置类型

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

末蓝

年少掌心

党海生

飞翔的企鹅

鹿港小镇

wookoon

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。