如何提取模式后的单词，但该单词在下一行？

发布于 2025-01-10 20:31:34 字数 269 浏览 0 评论 0原文

我想提取模式后面的每个单词，但是，我只能提取与模式位于同一行的单词，如果该单词紧随换行符之后出现，我将无法获取它。例如，

Gary is a college student.
Steve and John are college
teachers.

我想提取“学生”和“老师”，但我只得到“学生”。我的解决方案是

grep -oP '(?<=college )[\w+]*' | sort | uniq

原文

I want to extract every word that comes after the pattern, however, I can only extract the word is in the same line with the pattern, if the word is come right after a line break I'm not able to get it. For example,

Gary is a college student.
Steve and John are college
teachers.

I want to extract "student" and "teachers", but I only got "student" back.
My solution is

grep -oP '(?<=college )[\w+]*' | sort | uniq

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何止钟意 2025-01-17 20:31:34

像 grep 这样的工具基本上是面向行的。 GNU grep 有一个 -z 选项，可以使用 0 字节作为分隔符而不是换行符，这可以让您将输入文件视为单个大“行”：

$ grep -Pzo 'college\s+\K\w+' input.txt | tr '\0' '\n'
student
teachers

Tools like grep are fundamentally line oriented. GNU grep has a -z option to use 0 bytes as delimiters instead of newlines, though, which will let you treat the input file as a single big 'line':

$ grep -Pzo 'college\s+\K\w+' input.txt | tr '\0' '\n'
student
teachers

回复收藏 0 原文

第几種人 2025-01-17 20:31:34

grep（或者实际上，一般来说，大多数 Unix 文本处理工具）检查单行，并且不能跨越行边界进行匹配。一个简单的 Awk 脚本可能会起作用：

awk '{ for(i=1; i<NF; ++i)
    if ($i=="college") print $(i+1) }
$NF=="college" { n=1 }
n { print $1; n=0 }' file

您也可以轻松地重构它来计算 Awk 中的命中数，并避免使用管道来排序 | uniq （或者更好的是，sort -u），但我把它留作练习。学习足够的 Awk 来自己编写这样的简单脚本是值得的。

grep (or really, generally, most Unix text processing tools) examine a single line, and can't straddle a match across line boundaries. A simple Awk script might work instead:

awk '{ for(i=1; i<NF; ++i)
    if ($i=="college") print $(i+1) }
$NF=="college" { n=1 }
n { print $1; n=0 }' file

You can easily refactor this to count the number of hits in Awk, too, and avoid the pipe to sort | uniq (or, better, sort -u), but I left that as an exercise. Learning enough Awk to write simple scripts like this yourself is time well spent.

回复收藏 0 原文

~没有更多了~