每次出现字符串时查找、替换和递增

发布于 2024-11-15 02:14:44 字数 889 浏览 4 评论 0原文

我对脚本编写比较陌生,对于这个令人痛苦的简单问题提前表示歉意。我相信我已经搜索得相当彻底,但显然没有其他答案或食谱足够明确让我理解(例如 此处 - 仍然无法得到它)。

我有一个由字母串(DNA,如果你关心的话)组成的文件,每行一个字符串。在每个字符串上方,我插入了另一行来标识底层字符串。对于那些生物信息学家,我正在尝试以 fasta 格式制作一个测试数据集,也许你有工具?不管怎样,我会在每个“>”后面放一个不同的词“num”目的是使用 bash 增量器和 sed 创建每个字符串标题的唯一数字。例如,在 data.txt 中,我有...

>数字,等等,等等,等等

ATCGACTGAATCGA

>数字,等等,等等,等等

ATCGATCGATCGATCG

>数字,等等,等等,等等

ATCGATCGATCGATCG

我希望它是...

>0,等等,等等,等等

ATCGACTGAATCGA

>1,废话,废话,废话

ATCGATCGATCGATCG

>2,废话,废话,废话

ATCGATCGATCGATCG

解决方案可以是任何语言,只要它是完整的 &&完成工作。我对 sed、awk、bash 和 c++ 有一点经验(有点==比没有经验略多)。我知道,我知道,我需要学习 perl,但我才刚刚开始。问题是:如何用每次替换时递增的数字替换“num”?底层字符串是否与其他地方的另一个字符串相同并不重要。提前感谢您的帮助!

I'm relatively new to scripting and apologize in advance for this painfully simple problem. I believe I've searched pretty thoroughly, but apparently no other answers or cookbooks have been explicit enough for me to understand (like here - still couldn't get it).

I have a file that is made up of strings of letters (DNA, if you care), one string per line. Above each string I've inserted another line to identify the underlying string. For those of you who are bioinformaticians, I'm trying to make up a test data set in fasta format, maybe you have tools? Anyway, I'd put a distinct word, "num", after each ">" with the intention of using a bash incrementer and sed to create a unique number heading each string. For example, in data.txt, I have...

>num, blah, blah, blah

ATCGACTGAATCGA

>num, blah, blah, blah

ATCGATCGATCGATCG

>num, blah, blah, blah

ATCGATCGATCGATCG

I would like it to be...

>0, blah, blah, blah

ATCGACTGAATCGA

>1, blah, blah, blah

ATCGATCGATCGATCG

>2, blah, blah, blah

ATCGATCGATCGATCG

The solution can be in any language as long as it's complete && gets the job done. I have a little experience with sed, awk, bash, and c++ (little == slightly more than no experience). I know, I know, I need to learn perl, but I've only just started. The question is this: How to replace "num" with a number that increments on each replacement? It doesn't matter if the underlying string is identical to another somewhere else. Thanks for your help in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

粉红×色少女 2024-11-22 02:14:44
perl -ple 's/num/$n++/e' filename

试运行第一,如果是这样做,你想要什么

perl -ple 's/num/$n++/e' filename

dry run 1st, if it is do that, what you want

心是晴朗的。 2024-11-22 02:14:44

这使用了进程替换,该进程可能在您的系统上可用,也可能不可用。

jcomeau@intrepid:/tmp$ exec 3< <(cat test.txt)
jcomeau@intrepid:/tmp$ i=0
jcomeau@intrepid:/tmp$ while read -u 3 first_word the_rest; do
 if [ "$first_word" == ">num," ]; then
 echo ">$i," $the_rest; i=$((i + 1)); else
 echo $first_word $the_rest; fi; done
>0, blah, blah, blah

ATCGACTGAATCGA

>1, blah, blah, blah

ATCGATCGATCGATCG

>2, blah, blah, blah

ATCGATCGATCGATCG

This uses process substitution, which may or may not be available on your system.

jcomeau@intrepid:/tmp$ exec 3< <(cat test.txt)
jcomeau@intrepid:/tmp$ i=0
jcomeau@intrepid:/tmp$ while read -u 3 first_word the_rest; do
 if [ "$first_word" == ">num," ]; then
 echo ">$i," $the_rest; i=$((i + 1)); else
 echo $first_word $the_rest; fi; done
>0, blah, blah, blah

ATCGACTGAATCGA

>1, blah, blah, blah

ATCGATCGATCGATCG

>2, blah, blah, blah

ATCGATCGATCGATCG
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文