如何在 YQL 查询中使用多个 xpath 选择器

发布于 2024-09-27 11:12:08 字数 580 浏览 5 评论 0原文

嘿,我想使用 YQL 从我的博客中抓取一些数据:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query?例如,我可以做类似的事情:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

假设我想获得帖子和标题吗?我想我可以接受所有 HTML,但我宁愿只接受我需要的内容,因为速度是这里的一个问题。

一旦我有了 HTML,我想从标记中提取文本,是否可以使用 PHP 正则表达式来实现此目的?

我还知道您可以使用 CSS 语法,如果您有使用 YQL 的经验,并且可以指导我如何编写与上面的查询类似的查询,但使用 CSS 而不是 XPATH,我将不胜感激!

谢谢。

Hey, I'd like to scrape some data from my blog using YQL:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query? E.g. can I do something like:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.

Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?

I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

耀眼的星火 2024-10-04 11:12:08

关于 CSS:

请参阅 YQL 网站本身。在 google 中搜索 YQL 和 CSS(我只能在此处发布一个链接,第二个链接更有用。)

他们那里的示例实际上不再有效,但您可以尝试这个示例,该示例从首页上删除了问题堆栈溢出。

YQL 示例

使用一个 XPATH 进行多项选择:

您可以直接使用 xpath 语法执行此操作。例如

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"

Regarding CSS:

See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)

The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.

YQL example

Multiple Selects with one XPATH:

You CAN do this directly with xpath syntax. e.g.

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
缱倦旧时光 2024-10-04 11:12:08

您还可以像这样编写多个 XPATH 选择:

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title' or @name='description']"

You can also write Multiple XPATH Selects like this:

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title' or @name='description']"
ヤ经典坏疍 2024-10-04 11:12:08

这是不可能的。您需要执行此查询两次。第一次用于第一个 xpath,第二次用于第二个 xpath。当然,您可以编写自己的 open table 声明并为此类提供支持查询。

It is not possible. You need to execute this query twice. The first time for the first xpath and the second time for the second xpath. Of course you can write your own open table declaration and provide support for this kind of queries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文