提取正则表达式匹配 grep 的第一个位置

发布于 2025-01-13 11:42:06 字数 409 浏览 3 评论 0原文

大家早上好,

我有一个包含多行的文本文件。我想找到其中的常规模式并使用 grep 打印其位置。

例如:

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG

我想在文件中找到L[any_letter]T,并打印L的位置和三字母代码。在这种情况下,结果将是:

11 LIT
8 LAT
4 LKT

我在 grep 中编写了代码,但它没有返回我需要的内容。代码是:

grep -E -boe "L.T" file.txt

它返回:

11:LIT
21:LAT
30:LKT

任何帮助将不胜感激!

Good morning everyone,

I have a text file containing multiple lines. I want to find a regular pattern inside it and print its position using grep.

For example:

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG

I want to find L[any_letter]T in the file and print the position of L and the three letter code. In this case it would results as:

11 LIT
8 LAT
4 LKT

I wrote a code in grep, but it doesn't return what I need. The code is:

grep -E -boe "L.T" file.txt

It returns:

11:LIT
21:LAT
30:LKT

Any help would be appreciated!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鸵鸟症 2025-01-20 11:42:06

Awk 更适合这一点:

awk 'match($0, /L[[:alpha:]]T/) {
print RSTART, substr($0, RSTART, RLENGTH)}' file

11 LIT
8 LAT
4 LKT

假设每行只有一个这样的匹配。


如果每行可以有多个重叠匹配项,则使用:

awk '{
   n = 0
   while (match($0, /L[[:alpha:]]T/)) {
      n += RSTART
      print n, substr($0, RSTART, RLENGTH)
      $0 = substr($0, RSTART + 1)
   }
}' file

Awk suites this better:

awk 'match($0, /L[[:alpha:]]T/) {
print RSTART, substr($0, RSTART, RLENGTH)}' file

11 LIT
8 LAT
4 LKT

This is assuming only one such match per line.


If there can be multiple overlapping matches per line then use:

awk '{
   n = 0
   while (match($0, /L[[:alpha:]]T/)) {
      n += RSTART
      print n, substr($0, RSTART, RLENGTH)
      $0 = substr($0, RSTART + 1)
   }
}' file
城歌 2025-01-20 11:42:06

对于显示的示例,请尝试执行以下 awk 代码。在 GNU awk 中编写和测试,应该可以在任何 awk 中工作。

awk '
{
  ind=prev=""
  while(ind=index($0,"L")){
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){
      if(prev==""){ print prev+ind,substr($0,ind,3)   }
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }
    }
    $0=substr($0,ind+3)
  prev+=ind
  }
}'  Input_file

说明:为上述代码添加详细说明。

awk '                                                     ##Starting awk program from here.
{
  ind=prev=""                                             ##Nullifying ind and prev variables here.
  while(ind=index($0,"L")){                               ##Run while loop to check if index for L letter is found(whose index will be stored into ind variable).
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){      ##Checking condition if letter after 1 position of L is T AND letter next to L is a letter.
      if(prev==""){ print prev+ind,substr($0,ind,3)   }   ##Checking if prev variable is NULL then printing prev+ind along with 3 letters from index of L eg:(LIT).
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }   ##If prev is greater than 1 then printing prev+ind+2 and along with 3 letters from index of L eg:(LIT).
    }
    $0=substr($0,ind+3)                                   ##Setting value of rest of line value to 2 letters after matched L position.
  prev+=ind                                               ##adding ind to prev value.
  }
}'  Input_file                                            ##Mentioning Input_file name here.

With your shown samples, please try following awk code. Written and tested in GNU awk, should work in any awk.

awk '
{
  ind=prev=""
  while(ind=index($0,"L")){
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){
      if(prev==""){ print prev+ind,substr($0,ind,3)   }
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }
    }
    $0=substr($0,ind+3)
  prev+=ind
  }
}'  Input_file

Explanation: Adding detailed explanation for above code.

awk '                                                     ##Starting awk program from here.
{
  ind=prev=""                                             ##Nullifying ind and prev variables here.
  while(ind=index($0,"L")){                               ##Run while loop to check if index for L letter is found(whose index will be stored into ind variable).
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){      ##Checking condition if letter after 1 position of L is T AND letter next to L is a letter.
      if(prev==""){ print prev+ind,substr($0,ind,3)   }   ##Checking if prev variable is NULL then printing prev+ind along with 3 letters from index of L eg:(LIT).
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }   ##If prev is greater than 1 then printing prev+ind+2 and along with 3 letters from index of L eg:(LIT).
    }
    $0=substr($0,ind+3)                                   ##Setting value of rest of line value to 2 letters after matched L position.
  prev+=ind                                               ##adding ind to prev value.
  }
}'  Input_file                                            ##Mentioning Input_file name here.
夏雨凉 2025-01-20 11:42:06

查看 @anubhava 的答案,您还可以对 RSTART + RLENGTH 求和并将其用作 substr 获取每行和每个单词有多个匹配项。

while 循环获取当前行,并且对于每次迭代,它都会通过将其设置为最后一个匹配之后直到字符串末尾的部分来更新其值。

请注意,如果您在正则表达式中使用 .,它可以匹配任何字符。

awk '{
  pos = 0
  while (match($0, /L[a-zA-Z]T/)) {
    pos += RSTART;
    print pos, substr($0, RSTART, RLENGTH)
    $0 = substr($0, RSTART + RLENGTH)
   }
}' file

如果文件包含

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG
ARTGHFRHOPLITLOT LATTELET
LUT

输出是

11 LIT
8 LAT
4 LKT
11 LIT
12 LOT
14 LAT
17 LET
1 LUT

Peeking at the answer of @anubhava you might also sum the RSTART + RLENGTH and use that as the start for the substr to get multiple matches per line and per word.

The while loop takes the current line, and for every iteration it updates its value by setting it to the part right after the last match till the end of the string.

Note that if you use the . in a regex it can match any character.

awk '{
  pos = 0
  while (match($0, /L[a-zA-Z]T/)) {
    pos += RSTART;
    print pos, substr($0, RSTART, RLENGTH)
    $0 = substr($0, RSTART + RLENGTH)
   }
}' file

If file contains

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG
ARTGHFRHOPLITLOT LATTELET
LUT

The output is

11 LIT
8 LAT
4 LKT
11 LIT
12 LOT
14 LAT
17 LET
1 LUT
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文