如何提取模式后的单词,但该单词在下一行?
我想提取模式后面的每个单词,但是,我只能提取与模式位于同一行的单词,如果该单词紧随换行符之后出现,我将无法获取它。例如,
Gary is a college student.
Steve and John are college
teachers.
我想提取“学生”和“老师”,但我只得到“学生”。 我的解决方案是
grep -oP '(?<=college )[\w+]*' | sort | uniq
I want to extract every word that comes after the pattern, however, I can only extract the word is in the same line with the pattern, if the word is come right after a line break I'm not able to get it. For example,
Gary is a college student.
Steve and John are college
teachers.
I want to extract "student" and "teachers", but I only got "student" back.
My solution is
grep -oP '(?<=college )[\w+]*' | sort | uniq
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
像
grep
这样的工具基本上是面向行的。 GNU grep 有一个-z
选项,可以使用 0 字节作为分隔符而不是换行符,这可以让您将输入文件视为单个大“行”:Tools like
grep
are fundamentally line oriented. GNU grep has a-z
option to use 0 bytes as delimiters instead of newlines, though, which will let you treat the input file as a single big 'line':grep(或者实际上,一般来说,大多数 Unix 文本处理工具)检查单行,并且不能跨越行边界进行匹配。一个简单的 Awk 脚本可能会起作用:
您也可以轻松地重构它来计算 Awk 中的命中数,并避免使用管道来排序 | uniq (或者更好的是,
sort -u
),但我把它留作练习。学习足够的 Awk 来自己编写这样的简单脚本是值得的。grep
(or really, generally, most Unix text processing tools) examine a single line, and can't straddle a match across line boundaries. A simple Awk script might work instead:You can easily refactor this to count the number of hits in Awk, too, and avoid the pipe to
sort | uniq
(or, better,sort -u
), but I left that as an exercise. Learning enough Awk to write simple scripts like this yourself is time well spent.