使用 _details 方法单击链接时,Scrubyt 给出 404 错误
这可能与我之前的两个问题类似 - 请参阅 此处和此处但我尝试使用 _detail 命令自动单击链接,以便我可以抓取每个单独事件的详细信息页面。
我使用的代码是:
require 'rubygems'
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 20
end
nuffield_data.to_xml.write($stdout,1)
有没有办法打印出使用 event_detail 尝试访问的 URL? 该错误似乎没有给我提供 404 的 URL。
更新: 我认为该链接可能是相对链接 - 这可能会导致问题吗? 有什么想法如何处理吗?
This might be a similar problem to my earlier two questions - see here and here but I'm trying to use the _detail command to automatically click the link so I can scrape the details page for each individual event.
The code I'm using is:
require 'rubygems'
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 20
end
nuffield_data.to_xml.write($stdout,1)
Is there any way to print out the URL that using the event_detail is trying to access? The error doesn't seem to give me the URL that gave the 404.
Update: I think the link may be a relative link - could this be causing problems? Any ideas how to deal with that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我对相对链接也有同样的问题,并像这样修复了它......你必须将 :resolve 参数设置为正确的基本网址
I had the same issue with relative links and fixed it like this... you have to set the :resolve param to the correct base url
现在再次运行脚本,当引发异常时,您应该进入调试器。 只需尝试在调试提示符中键入以下内容即可查看有问题的 URL 是什么:
如果您想检查发生了什么,您还可以在该方法的任何位置添加调试器语句 - 例如,您可能想在第 51 行和第 52 行之间添加一个调试器语句使用此方法来检查正在调用的 url 如何发生变化以及原因。
这基本上就是我如何找到你之前问题的答案的。
祝你好运。
Now run the script again and you should be dropped into a debugger when the exception is raised. Just try typing this a the debug prompt to see what the offending URL is:
You can also add a debugger statement anywhere in that method if you want to check what is going on - for example you may want to add one between line 51 and 52 of this method to check how the url that is being called changes and why.
This is basically how I figured out the answer to your previous questions.
Good luck.
抱歉,我不知道为什么这会是 nil - 每次我运行它时它都会返回一个 url - self.fetch 方法需要一个 URL,您应该能够作为本地变量 doc_url 访问该 URL。 如果返回 nil,您也可以将代码发布到包含调试器调用的位置。
Sorry I have no idea why this would be nil - every time I have run this it returns a url - the method self.fetch requires a URL which you should be able to access as the local variable doc_url. If this returns nil also may you should post the code where you have included the debugger call.
我尝试访问 doc_url 但似乎也返回 nil。 当我可以访问我的服务器时(当天晚些时候),我将发布带有调试位的代码。
I've tried to access doc_url but that seems to also return nil. When I have access to my server (later in the day) I'll post the code with the debugging bit in it.