使用grep从中提取html容器标签
我有一个页面,其中包含不同作者的许多帖子。我想要该帖子页面中用户 A 的帖子。
如何设置 grep 来查看作者页面中每个帖子的 html 块,然后将帖子内容打印到文件中?帖子结构类似于
<!--Begin Msg Number #####-->
[useless junk i'm not interested in here]
<span class="author vcard"><a class="url fn" href='url here'>User A</a> </span>
[more junk]
<div class='post entry-content '>
<!--cached-some date string--> Here's the text I want to extract
</div>
[more junk]
<hr />
我认为该结构类似于
grep /pattern/ output file
但我是否需要明确告诉它仅在
<!-- begin msg ... -->
和
<hr />
绑定帖子的 标签之间进行搜索,或者 grep 是否足够智能以自动执行此操作?我担心当 grep 找到用户 A 的模式时,它会将所有帖子内容打印到一个文件中,而不仅仅是那个特定的内容。
I have a page that has many posts by different authors. I want the posts from user A from that page of posts.
How can I set up grep to look at each post's html block in the page for the author, then print the content of the post to a file? The post structure is something like
<!--Begin Msg Number #####-->
[useless junk i'm not interested in here]
<span class="author vcard"><a class="url fn" href='url here'>User A</a> </span>
[more junk]
<div class='post entry-content '>
<!--cached-some date string--> Here's the text I want to extract
</div>
[more junk]
<hr />
I think the structure is something like
grep /pattern/ output file
but do I need to explicitly tell it to hunt only between the
<!-- begin msg ... -->
and
<hr />
tags that bound the post, or is grep smart enough to do that automatically? I'm worried that when grep finds the pattern of User A, it will print all the post contents to a file instead of just that particular one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果所有帖子文本都在一行上,则尝试
If all the post text is on one line, then try