Yahoo Pipe：如何解析子 DIV

发布于 2024-11-06 08:34:24 字数 583 浏览 3 评论 0原文

对于一个有多个DIV的页面，如何只从包含有用文本的DIV中获取内容，并避免其他用于广告的DIV等。

例如，这样的页面结构：

……

<div id="articlecopy">

  <div class="advertising 1">Ads I do not want to fetch.</div>

  <p>Useful texts go here</p>

  <div class="advertising 2">Ads I do not want to fetch.</div>

  <div class="related_articles_list">I do not want to read related articles so parse this part too</div>

</div>

在

这个虚构的例子中，我想去掉广告的两个 DIV 和相关文章的 DIV。我想要的只是获取

父 DIV 中的有用内容。

管道可以做到这一点吗？

谢谢。

原文

For a page which has multiple DIVs, how to just fetch content from DIVs that contain useful text and avoid other DIVs that are for ads, etc.

For example, a page structure like this:

...

<div id="articlecopy">

  <div class="advertising 1">Ads I do not want to fetch.</div>

  <p>Useful texts go here</p>

  <div class="advertising 2">Ads I do not want to fetch.</div>

  <div class="related_articles_list">I do not want to read related articles so parse this part too</div>

</div>

...

In this fictional example, I want get rid of the two DIVs for advertising and the DIV for related articles. All I want is to fetch the useful content in

inside the parent DIV.

Can Pipe do this?

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零度℉ 2024-11-13 08:34:24

尝试使用 xpath 的 YQL 模块。沿着这些思路：

SELECT * from html where url="http://MyWebPageWithAds.com" and xpath='//div/p'

上面的查询将检索

内的 html 部分。父级

下的标签标签。如果您的 DIV 具有属性，您会喜欢 xpath。

举例来说，您有一个包含多个 DIV 的页面，但您想要的页面如下所示：

<div>
    <div>Stuff I don't want</div>
    <div class="main_content">Stuff I want to add to my feed</div>
    <div>Other stuff I don't want</div> 
</div>

您可以将上面的 YQL 字符串更改为：

SELECT * from html where url="http://MyWebPageWithAds.com" 
and xpath='//div/div[contains(@class,"main_content")]'

我自己最近才发现 YQL，并且对使用 xpath 相当陌生，但它有到目前为止为我工作。

Try the YQL module with xpath. Something along these lines:

SELECT * from html where url="http://MyWebPageWithAds.com" and xpath='//div/p'

The above query will retrieve the part of the html inside the <p> tag under the parent <div> tag. You can get fancy with xpath if your DIVs have attributes.

Say for example you had a page with several DIVs, but the one you wanted looked like this:

<div>
    <div>Stuff I don't want</div>
    <div class="main_content">Stuff I want to add to my feed</div>
    <div>Other stuff I don't want</div> 
</div>

You would change the YQL string above to this:

SELECT * from html where url="http://MyWebPageWithAds.com" 
and xpath='//div/div[contains(@class,"main_content")]'

I've only recently discovered YQL myself, and am fairly new to using xpaths, but it has worked for me so far.

回复收藏 0 原文

~没有更多了~

关于作者

泅渡

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

Yahoo Pipe：如何解析子 DIV

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

Yahoo Pipe：如何解析子 DIV

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。