每行列数不一致的文件所需的转置类型
我有一个制表符分隔的文件(其中每行的列数不固定),如下所示:
chr1 92536437 92537640 NM_024813 NM_053274
我想按以下顺序从中获得一个文件(前三列是我在拆分时需要的标识符)< br>
chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274
对 shell 脚本的建议。
I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:
chr1 92536437 92537640 NM_024813 NM_053274
I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)
chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274
Suggestions for a shell script.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请注意,您应该在那里输入一个真正的选项卡(
IFS
)我还认为我应该做一个 perl 版本:
要从命令行完成所有操作,从 in.txt 读取并输出到 out.txt
:当然,如果您保存 perl 脚本(例如 script.pl)
如果您还使脚本文件可执行(
chmod +x script.pl
):HTH
Note that you should enter a real tab there (
IFS
)I also thought I should do a perl version:
To do it all from the commandline, reading from in.txt and outputting to out.txt:
Of course if you save the perl script (say as script.pl)
If you also make the script file executable (
chmod +x script.pl
):HTH
不是 shell,另一个答案完全没问题,但我用 perl 对其进行了编辑:
编辑:新版本(甚至更难以阅读;),受到其他答案的启发。滥用 perl 的命令行参数和特殊变量进行自动分割和行结束处理。
含义:对于前三个字段之后的每个字段(
对于拼接@F,3
),打印前三个字段及其(print @F,$_
)。-F
将字段分隔符设置为\s
(应为\t
),以便-a
自动拆分为>@F
。-l
打开-n
的行结束处理,它为输入的每一行运行-e
代码。$,
是输出字段分隔符。Not shell, and the other answer is perfectly fine, but i onelined it in perl :
Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.
Means: For each of the fields after the three first (
for splice @F,3
), print the first three and it (print @F,$_
).-F
sets the field separator to\s
(should be\t
) for-a
autosplitting into@F
.-l
turns on line ending handling for-n
which runs the-e
code for each line of the input.$,
is the output field separator.[编辑]
那么您想为每个剩余项目复制前三列吗?
返回:
这会将前三个空格分隔列作为前缀,然后滥用 for 循环将逐步遍历空格分隔列表来调用 echo 的事实。
享受。
[Edited]
So you want to duplicate the first three columns for each remaining item?
Returns:
This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.
Enjoy.
这只是您的两个文件中的数据比较问题的子集。
从那里提取我的有点老套的解决方案:
This is just a subset of your data comparison in two files question.
Extracting my slightly hacky solution from there: