每行列数不一致的文件所需的转置类型

发布于 2024-11-16 08:01:31 字数 273 浏览 0 评论 0原文

我有一个制表符分隔的文件(其中每行的列数不固定),如下所示:

chr1 92536437 92537640 NM_024813 NM_053274

我想按以下顺序从中获得一个文件(前三列是我在拆分时需要的标识符)< br>

chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274

对 shell 脚本的建议。

I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:

chr1 92536437 92537640 NM_024813 NM_053274

I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)

chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274

Suggestions for a shell script.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

韶华倾负 2024-11-23 08:01:31
#!/bin/bash
{
    IFS='   '
    while read a b c rest
    do
        for fld in $rest
        do
            echo -e "$a\t$b\t$c\t$fld"
        done
    done
}

请注意,您应该在那里输入一个真正的选项卡(IFS

我还认为我应该做一个 perl 版本:

#!/bin/perl -n
($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r

要从命令行完成所有操作,从 in.txt 读取并输出到 out.txt

perl -ne '($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r' in.txt > out.txt

:当然,如果您保存 perl 脚本(例如 script.pl)

perl script.pl in.txt > out.txt

如果您还使脚本文件可执行(chmod +x script.pl):

./script.pl in.txt > out.txt

HTH

#!/bin/bash
{
    IFS='   '
    while read a b c rest
    do
        for fld in $rest
        do
            echo -e "$a\t$b\t$c\t$fld"
        done
    done
}

Note that you should enter a real tab there (IFS)

I also thought I should do a perl version:

#!/bin/perl -n
($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r

To do it all from the commandline, reading from in.txt and outputting to out.txt:

perl -ne '($a,$b,$c,@r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for @r' in.txt > out.txt

Of course if you save the perl script (say as script.pl)

perl script.pl in.txt > out.txt

If you also make the script file executable (chmod +x script.pl):

./script.pl in.txt > out.txt

HTH

心舞飞扬 2024-11-23 08:01:31

不是 shell,另一个答案完全没问题,但我用 perl 对其进行了编辑:

perl -F'/\s/' -lane '$,="\t"; print @F,$_ for splice @F,3' $FILE

编辑:新版本(甚至更难以阅读;),受到其他答案的启发。滥用 perl 的命令行参数和特殊变量进行自动分割和行结束处理。

含义:对于前三个字段之后的每个字段(对于拼接@F,3),打印前三个字段及其(print @F,$_)。

-F 将字段分隔符设置为 \s(应为 \t),以便 -a 自动拆分为 >@F

-l 打开 -n 的行结束处理,它为输入的每一行运行 -e 代码。

$, 是输出字段分隔符。

Not shell, and the other answer is perfectly fine, but i onelined it in perl :

perl -F'/\s/' -lane '$,="\t"; print @F,$_ for splice @F,3' $FILE

Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.

Means: For each of the fields after the three first (for splice @F,3), print the first three and it (print @F,$_).

-F sets the field separator to \s (should be \t) for -a autosplitting into @F.

-l turns on line ending handling for -n which runs the -e code for each line of the input.

$, is the output field separator.

囚我心虐我身 2024-11-23 08:01:31

[编辑]

那么您想为每个剩余项目复制前三列吗?

$ cat File | while read X
      do PRE=$(echo "$X" | cut -f1-3 -d ' ')
      for Y in $(echo "$X" | cut -f4- -d ' ')
          do echo $PRE $Y >> OutputFilename
      done
  done

返回:

chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR

这会将前三个空格分隔列作为前缀,然后滥用 for 循环将逐步遍历空格分隔列表来调用 echo 的事实。

享受。

[Edited]

So you want to duplicate the first three columns for each remaining item?

$ cat File | while read X
      do PRE=$(echo "$X" | cut -f1-3 -d ' ')
      for Y in $(echo "$X" | cut -f4- -d ' ')
          do echo $PRE $Y >> OutputFilename
      done
  done

Returns:

chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR

This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.

Enjoy.

顾冷 2024-11-23 08:01:31

这只是您的两个文件中的数据比较问题的子集。

从那里提取我的有点老套的解决方案:

for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'

This is just a subset of your data comparison in two files question.

Extracting my slightly hacky solution from there:

for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文