YQL 抓取整个网站/域
我正在尝试从域中恢复一组链接和内容。
谷歌中的查询是
"site:www.newswebsite.com search_term"
我已经看到了一些接近的东西来实现这个工作,但我似乎无法完全在整个网站上进行搜索,然后按搜索词进行过滤。
如果没有自定义数据表,这可能吗?
I'm trying to scape back a set of links and content from a domain.
The Query in google would be
"site:www.newswebsite.com search_term"
I've seen some close stuff to getting this working, but I can't seem to quite get a search working across a whole website, and then filter by the search term.
Is this possible without a custom data table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我最终明白了它的真相。
这会搜索 3 个站点,按日期排序,最新的排在最前面。有另一种方法可以反转排序,但这似乎目前有效。我认为它在排序中降序= true (field='date',descending='true')
非常有用,即使我自己这么说。
I got to the bottom of it in the end.
This searches 3 sites, orders by date, and newest first. There is an alternate way to reverse the sort, but this seems to work for now. I think it's descending=true within the sort (field='date',descending='true')
Very useful, even if I do say so myself.
Christian Heilmann 刚刚写了一篇关于 YQL 的相当不错的文章,并从 24ways 网站。
Christian Heilmann just wrote a fairly nice writeup on YQL and getting information back from an HTML datasource on the 24ways website.