提取正则表达式匹配 grep 的第一个位置

发布于 2025-01-13 11:42:06 字数 409 浏览 3 评论 0原文

大家早上好，

我有一个包含多行的文本文件。我想找到其中的常规模式并使用 grep 打印其位置。

例如：

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG

我想在文件中找到L[any_letter]T，并打印L的位置和三字母代码。在这种情况下，结果将是：

11 LIT
8 LAT
4 LKT

我在 grep 中编写了代码，但它没有返回我需要的内容。代码是：

grep -E -boe "L.T" file.txt

它返回：

11:LIT
21:LAT
30:LKT

任何帮助将不胜感激！

原文

Good morning everyone,

I have a text file containing multiple lines. I want to find a regular pattern inside it and print its position using grep.

For example:

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG

I want to find L[any_letter]T in the file and print the position of L and the three letter code. In this case it would results as:

11 LIT
8 LAT
4 LKT

I wrote a code in grep, but it doesn't return what I need. The code is:

grep -E -boe "L.T" file.txt

It returns:

11:LIT
21:LAT
30:LKT

Any help would be appreciated!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸵鸟症 2025-01-20 11:42:06

Awk 更适合这一点：

awk 'match($0, /L[[:alpha:]]T/) {
print RSTART, substr($0, RSTART, RLENGTH)}' file

11 LIT
8 LAT
4 LKT

假设每行只有一个这样的匹配。

如果每行可以有多个重叠匹配项，则使用：

awk '{
   n = 0
   while (match($0, /L[[:alpha:]]T/)) {
      n += RSTART
      print n, substr($0, RSTART, RLENGTH)
      $0 = substr($0, RSTART + 1)
   }
}' file

Awk suites this better:

awk 'match($0, /L[[:alpha:]]T/) {
print RSTART, substr($0, RSTART, RLENGTH)}' file

11 LIT
8 LAT
4 LKT

This is assuming only one such match per line.

If there can be multiple overlapping matches per line then use:

awk '{
   n = 0
   while (match($0, /L[[:alpha:]]T/)) {
      n += RSTART
      print n, substr($0, RSTART, RLENGTH)
      $0 = substr($0, RSTART + 1)
   }
}' file

回复收藏 0 原文

城歌 2025-01-20 11:42:06

对于显示的示例，请尝试执行以下 awk 代码。在 GNU awk 中编写和测试，应该可以在任何 awk 中工作。

awk '
{
  ind=prev=""
  while(ind=index($0,"L")){
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){
      if(prev==""){ print prev+ind,substr($0,ind,3)   }
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }
    }
    $0=substr($0,ind+3)
  prev+=ind
  }
}'  Input_file

说明：为上述代码添加详细说明。

awk '                                                     ##Starting awk program from here.
{
  ind=prev=""                                             ##Nullifying ind and prev variables here.
  while(ind=index($0,"L")){                               ##Run while loop to check if index for L letter is found(whose index will be stored into ind variable).
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){      ##Checking condition if letter after 1 position of L is T AND letter next to L is a letter.
      if(prev==""){ print prev+ind,substr($0,ind,3)   }   ##Checking if prev variable is NULL then printing prev+ind along with 3 letters from index of L eg:(LIT).
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }   ##If prev is greater than 1 then printing prev+ind+2 and along with 3 letters from index of L eg:(LIT).
    }
    $0=substr($0,ind+3)                                   ##Setting value of rest of line value to 2 letters after matched L position.
  prev+=ind                                               ##adding ind to prev value.
  }
}'  Input_file                                            ##Mentioning Input_file name here.

With your shown samples, please try following awk code. Written and tested in GNU awk, should work in any awk.

awk '
{
  ind=prev=""
  while(ind=index($0,"L")){
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){
      if(prev==""){ print prev+ind,substr($0,ind,3)   }
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }
    }
    $0=substr($0,ind+3)
  prev+=ind
  }
}'  Input_file

Explanation: Adding detailed explanation for above code.

awk '                                                     ##Starting awk program from here.
{
  ind=prev=""                                             ##Nullifying ind and prev variables here.
  while(ind=index($0,"L")){                               ##Run while loop to check if index for L letter is found(whose index will be stored into ind variable).
    if(substr($0,ind+2,1)=="T" && substr($0,ind+1,1) ~ /[a-zA-Z]/){      ##Checking condition if letter after 1 position of L is T AND letter next to L is a letter.
      if(prev==""){ print prev+ind,substr($0,ind,3)   }   ##Checking if prev variable is NULL then printing prev+ind along with 3 letters from index of L eg:(LIT).
      if(prev>1)  { print prev+ind+2,substr($0,ind,3) }   ##If prev is greater than 1 then printing prev+ind+2 and along with 3 letters from index of L eg:(LIT).
    }
    $0=substr($0,ind+3)                                   ##Setting value of rest of line value to 2 letters after matched L position.
  prev+=ind                                               ##adding ind to prev value.
  }
}'  Input_file                                            ##Mentioning Input_file name here.

回复收藏 0 原文

夏雨凉 2025-01-20 11:42:06

查看 @anubhava 的答案，您还可以对 RSTART + RLENGTH 求和并将其用作 substr 获取每行和每个单词有多个匹配项。

while 循环获取当前行，并且对于每次迭代，它都会通过将其设置为最后一个匹配之后直到字符串末尾的部分来更新其值。

请注意，如果您在正则表达式中使用 .，它可以匹配任何字符。

awk '{
  pos = 0
  while (match($0, /L[a-zA-Z]T/)) {
    pos += RSTART;
    print pos, substr($0, RSTART, RLENGTH)
    $0 = substr($0, RSTART + RLENGTH)
   }
}' file

如果文件包含

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG
ARTGHFRHOPLITLOT LATTELET
LUT

输出是

11 LIT
8 LAT
4 LKT
11 LIT
12 LOT
14 LAT
17 LET
1 LUT

Peeking at the answer of @anubhava you might also sum the RSTART + RLENGTH and use that as the start for the substr to get multiple matches per line and per word.

The while loop takes the current line, and for every iteration it updates its value by setting it to the part right after the last match till the end of the string.

Note that if you use the . in a regex it can match any character.

awk '{
  pos = 0
  while (match($0, /L[a-zA-Z]T/)) {
    pos += RSTART;
    print pos, substr($0, RSTART, RLENGTH)
    $0 = substr($0, RSTART + RLENGTH)
   }
}' file

If file contains

ARTGHFRHOPLIT
GFRTLOPLATHLG
TGHLKTGVARTHG
ARTGHFRHOPLITLOT LATTELET
LUT

The output is

11 LIT
8 LAT
4 LKT
11 LIT
12 LOT
14 LAT
17 LET
1 LUT

回复收藏 0 原文

~没有更多了~

关于作者

╄→承喏

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

提取正则表达式匹配 grep 的第一个位置

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

提取正则表达式匹配 grep 的第一个位置

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。