$ f,\ t'\ n \ n \ n \ n''是什么意思是使用尴尬线性化Fasta时?

发布于 2025-02-13 15:43:25 字数 403 浏览 0 评论 0原文

我正在尝试使用尴尬线性化Fasta。我完全是新手。我有一个脚本,

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < $f | tr "\t" "\n" > ${f/.fasta/_lin.fasta}

我不了解&lt; $ f | tr“ \ t”“ \ n”&gt; $ {f/.fasta/_lin.fasta}。什么是$ f,whats trtn。我到底应该在哪里提供输入文件?有人可以详细说明吗?

I am trying to linearize fasta using awk. I am totally new to it. I have a script

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < $f | tr "\t" "\n" > ${f/.fasta/_lin.fasta}

I dont understand anything in the < $f | tr "\t" "\n" > ${f/.fasta/_lin.fasta}. What is $f, whats tr, t, n. Where exactly I am supposed to give the input file? Can someone please elaborate?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

玩世 2025-02-20 15:43:25

让我们逐步浏览该代码。首先,我会添加一些空白使其更加清晰:

awk '
  /^>/ {
    printf("%s%s\t", (N>0?"\n":""), $0);
    N++;
    next;
  }
  {
    printf("%s",$0);
  }
  END {
    printf("\n");
  }
' < $f \
  | tr "\t" "\n" \
  > ${f/.fasta/_lin.fasta}

好的。首先,$ f是您的输入文件。代码的作者期望它包含.fasta,大概是myfile.fasta。在此特定情况下,shell脚本中的&lt;运算符是多余的(除非您在文件名中具有等值的符号,因为awk可以将其解释为变量分配),只是告诉awk消耗该文件的内容。

然后,Awk进来并匹配以&gt;开头的行。在这些行上,它将打印一个新线(如果N&gt; 0),否则什么也没有,其次是线的内容。然后,它会增加N并跳过该行的下一个命令。其他线条被看到。阅读了$ f的所有行后,打印了最终的新线。

awk代码不是很清晰。它可以像这样重写:

awk '
  /^>/ && N++ {
    printf "\n";
  }
  {
    print;
  }
  END {
    printf "\n";
  }
'

这里唯一棘手的作品是n最初是零的,因此,当您第一次说n ++时,它会在增加之前返回值(零) = false),因此该条件不会触发。当您第二次说出来时,它会在下一个增量(一个= true)之前返回值,因此该条件触发。任何不是空字符串或零的东西都以true评估。

在一行,更高尔夫球上,可以是awk'/^&gt;/&amp;&amp; n ++ {printf“ \ n”} 1; end {printf“ \ n”}'}'(<代码> 1; 触发默认操作,即打印行)。

awk之后,输出将传递到tr将所有选项卡(\ t)转换为newlines(\ n) )。然后,使用&gt;运算符将输出输送到shell替换$ {f/.fasta/_lin.fasta}的文件中,该文件替换了第一个的实例.fasta in $ f带有_lin.fasta,所以我们的示例输入文件myfile.fasta被转换为输出文件myfile_lin.fasta

Let's step through that code piece by piece. First, I'll add some white space to make it more legible:

awk '
  /^>/ {
    printf("%s%s\t", (N>0?"\n":""), $0);
    N++;
    next;
  }
  {
    printf("%s",$0);
  }
  END {
    printf("\n");
  }
' < $f \
  | tr "\t" "\n" \
  > ${f/.fasta/_lin.fasta}

Okay. First, $f is your input file. The code's author expects it to contain .fasta, presumably at the end, like myfile.fasta. The < operator in shell scripts is redundant in this particular case (unless you have an equals sign in the filename since awk may interpret that as a variable assignment), simply telling awk to consume the contents of that file.

AWK then comes in and matches lines that start with >. On those lines, it will print a newline (if N > 0) or else nothing, followed by the contents of the line. It then increments N and skips the next command for that line. Other lines are printed as they're seen. After reading all of the lines of $f, a final newline is printed.

This awk code is not very legible. It could be rewritten like this:

awk '
  /^>/ && N++ {
    printf "\n";
  }
  {
    print;
  }
  END {
    printf "\n";
  }
'

The only tricky piece here is that N is initially zero, so when you say N++ the first time, it returns the value before incrementing (zero = false) and therefore that condition does not trigger. When you say it the second time, it returns the value before the next incrementing (one = true) and therefore that condition triggers. Anything that is not an empty string or a zero evaluates as true.

On one line, and more golfed, that could be awk '/^>/&&N++{printf"\n"}1;END{printf"\n"}' (1; triggers the default action, which is to print the line).

After awk, the output is passed to tr to translate all tabs (\t) into newlines (\n). Then the output is piped using the > operator to write to a file described by the shell replacement ${f/.fasta/_lin.fasta}, which replaces the first instance of .fasta in $f with _lin.fasta, so our example input file myfile.fasta is transformed to output file myfile_lin.fasta.

萧瑟寒风 2025-02-20 15:43:25

我应该在哪里提供输入文件?

您可能应该在使用它之前获取和理解工具的用户手册,除非您将较早的潜在损害宣布为可接受的。

tr "\t" "\n"

这是带有2个参数的tr命令,大多数Linux命令都配有手册,您可以像如此

man tr

流行的一个也具有在线版本一样访问,例如 tr manpage 从其中\ t\ n可以找到

\n
    new line
\t
    horizontal tab

Where exactly I am supposed to give the input file?

You should probably get and comprehend User Manual for tool before attemping to use it, unless you declared earlier potential damage as acceptable.

tr "\t" "\n"

This is tr command with 2 arguments, most linux commands are furnished with manual which you can access like so

man tr

popular one have also online versions, for example tr manpage from where meaning of \t and \n might be found

\n
    new line
\t
    horizontal tab
清秋悲枫 2025-02-20 15:43:25

我猜OP正在尝试做类似的事情,将awktr命令组合在一起:

 1  test
 2  123
 3  >456
 4  mnq
 5  >yesthis
 6  nothis
 7  789
  {m,g} awk'{ 
       printf(“%。*s%s%。*s”,
             (!__&lt; +(___ += _ =!__&lt; nf))*_,
              rs,$+__,_*(_!= ___),rs)
} 结尾 { 
       打印RS}'fs ='^&gt;' ORS = 
 
 1  test123>456mnq
 2  >yesthis
 3  nothis789   

I'm guessing OP is trying to do something like this combining both the awk and tr commands :

 1  test
 2  123
 3  >456
 4  mnq
 5  >yesthis
 6  nothis
 7  789
{m,g}awk '{ 
       printf("%.*s%s%.*s",
             (!__<+(___ +=_= !__ <  NF))*_,
              RS, $+__, _*(_ !=___),RS)
} END { 
       print RS }' FS='^>' ORS= 
 1  test123>456mnq
 2  >yesthis
 3  nothis789   
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文