如何提取模式后的单词,但该单词在下一行?

发布于 2025-01-10 20:31:34 字数 269 浏览 0 评论 0原文

我想提取模式后面的每个单词,但是,我只能提取与模式位于同一行的单词,如果该单词紧随换行符之后出现,我将无法获取它。例如,

Gary is a college student.
Steve and John are college
teachers.

我想提取“学生”和“老师”,但我只得到“学生”。 我的解决方案是

grep -oP '(?<=college )[\w+]*' | sort | uniq

I want to extract every word that comes after the pattern, however, I can only extract the word is in the same line with the pattern, if the word is come right after a line break I'm not able to get it. For example,

Gary is a college student.
Steve and John are college
teachers.

I want to extract "student" and "teachers", but I only got "student" back.
My solution is

grep -oP '(?<=college )[\w+]*' | sort | uniq

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何止钟意 2025-01-17 20:31:34

grep 这样的工具基本上是面向行的。 GNU grep 有一个 -z 选项,可以使用 0 字节作为分隔符而不是换行符,这可以让您将输入文件视为单个大“行”:

$ grep -Pzo 'college\s+\K\w+' input.txt | tr '\0' '\n'
student
teachers

Tools like grep are fundamentally line oriented. GNU grep has a -z option to use 0 bytes as delimiters instead of newlines, though, which will let you treat the input file as a single big 'line':

$ grep -Pzo 'college\s+\K\w+' input.txt | tr '\0' '\n'
student
teachers
第几種人 2025-01-17 20:31:34

grep(或者实际上,一般来说,大多数 Unix 文本处理工具)检查单行,并且不能跨越行边界进行匹配。一个简单的 Awk 脚本可能会起作用:

awk '{ for(i=1; i<NF; ++i)
    if ($i=="college") print $(i+1) }
$NF=="college" { n=1 }
n { print $1; n=0 }' file

您也可以轻松地重构它来计算 Awk 中的命中数,并避免使用管道来排序 | uniq (或者更好的是,sort -u),但我把它留作练习。学习足够的 Awk 来自己编写这样的简单脚本是值得的。

grep (or really, generally, most Unix text processing tools) examine a single line, and can't straddle a match across line boundaries. A simple Awk script might work instead:

awk '{ for(i=1; i<NF; ++i)
    if ($i=="college") print $(i+1) }
$NF=="college" { n=1 }
n { print $1; n=0 }' file

You can easily refactor this to count the number of hits in Awk, too, and avoid the pipe to sort | uniq (or, better, sort -u), but I left that as an exercise. Learning enough Awk to write simple scripts like this yourself is time well spent.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文