如何在 YQL 查询中使用多个 xpath 选择器
嘿,我想使用 YQL 从我的博客中抓取一些数据:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"
How can I use different bits of xpath in my query?例如,我可以做类似的事情:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"
假设我想获得帖子和标题吗?我想我可以接受所有 HTML,但我宁愿只接受我需要的内容,因为速度是这里的一个问题。
一旦我有了 HTML,我想从标记中提取文本,是否可以使用 PHP 正则表达式来实现此目的?
我还知道您可以使用 CSS 语法,如果您有使用 YQL 的经验,并且可以指导我如何编写与上面的查询类似的查询,但使用 CSS 而不是 XPATH,我将不胜感激!
谢谢。
Hey, I'd like to scrape some data from my blog using YQL:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"
How can I use different bits of xpath in my query? E.g. can I do something like:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"
assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.
Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?
I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
关于 CSS:
请参阅 YQL 网站本身。在 google 中搜索 YQL 和 CSS(我只能在此处发布一个链接,第二个链接更有用。)
他们那里的示例实际上不再有效,但您可以尝试这个示例,该示例从首页上删除了问题堆栈溢出。
YQL 示例
使用一个 XPATH 进行多项选择:
您可以直接使用 xpath 语法执行此操作。例如
Regarding CSS:
See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)
The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.
YQL example
Multiple Selects with one XPATH:
You CAN do this directly with xpath syntax. e.g.
您还可以像这样编写多个 XPATH 选择:
You can also write Multiple XPATH Selects like this:
这是不可能的。您需要执行此查询两次。第一次用于第一个 xpath,第二次用于第二个 xpath。当然,您可以编写自己的 open table 声明并为此类提供支持查询。
It is not possible. You need to execute this query twice. The first time for the first xpath and the second time for the second xpath. Of course you can write your own open table declaration and provide support for this kind of queries.