如何获取“下一页” 与 Scrubyt 的链接

发布于 2024-07-07 05:57:07 字数 751 浏览 13 评论 0原文

我正在尝试使用 Scrubyt 从此页面获取详细信息 http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events。我已设法从列表中获取标题和详细 URL，但无法使用 next_page 让抓取工具转到下一页。我认为这是因为我没有为下一页链接使用正确的模式。我尝试了字符串“Next Page”，也尝试了 XPath。还有其他想法吗？

代码如下：

require 'rubygems'
require 'scrubyt'

nuffield_data = Scrubyt::Extractor.define do
  fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events'

  event do
    title 'The Coast of Mayo'
    #url "href", :type => :attribute
    link_url
  end

  next_page "Next Page", :limit => 2


end

  nuffield_data.to_xml.write($stdout,1)

原文

I'm trying to use Scrubyt to get the details from this page http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events. I've managed to get the titles and detail URLs from the list, but I can't use next_page to get the scraper to go to the next page. I assume that's cause I'm not using the correct pattern for the next page link. I tried the string "Next Page", and I've also tried the XPath. Any other ideas?

The code is below:

require 'rubygems'
require 'scrubyt'

nuffield_data = Scrubyt::Extractor.define do
  fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events'

  event do
    title 'The Coast of Mayo'
    #url "href", :type => :attribute
    link_url
  end

  next_page "Next Page", :limit => 2


end

  nuffield_data.to_xml.write($stdout,1)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柒七 2024-07-14 05:57:07

请尝试使用稍微不同的 URL：

fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

scruyt 似乎在 URL 末尾的“?section=events”查询上存在问题。

当它查找下一页时，它会尝试返回以下 URL：

http://www.nuffieldtheatre.co.uk/cn/events/?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events

而不是：

http://www.nuffieldtheatre.co.uk/cn/events /event_listings.php?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events

删除 URL 末尾的查询字符串似乎可以解决此问题 - 您可能希望将其作为错误归档。

Try this with a slightly different URL:

fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

scrubyt seems to be having issues with "?section=events" query on the end of the URL.

When it looks for the next page it is trying to return this URL:

http://www.nuffieldtheatre.co.uk/cn/events/?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events

instead of:

http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?pageNum_rsSearch=1&totalRows_rsSearch=39§ion=events

Removing the query string on the end of the URL seems to fix this - you might want to file this as a bug.

回复收藏 0 原文

~没有更多了~

关于作者

我还不会笑

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

如何获取“下一页” 与 Scrubyt 的链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚守退让之实

小兔几

mb_3y7WUgWY

友情链接

如何获取“下一页” 与 Scrubyt 的链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚 守退让之实

小兔几

mb_3y7WUgWY

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

秉忠贞之诚守退让之实