使用OpenUri,如何获取重定向页面的内容?
我想从此页面获取数据:
http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=0656887000494793
但是该页面转发到:
http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?execution=eXs1
因此,当我从 OpenUri 使用 open
来尝试获取数据时,它会抛出 RuntimeError
错误说 HTTP 重定向循环:
我不太确定在重定向并抛出该错误后如何获取该数据。
I want to get data from this page:
http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=0656887000494793
But that page forwards to:
http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?execution=eXs1
So, when I use open
, from OpenUri, to try and fetch the data, it throws a RuntimeError
error saying HTTP redirection loop:
I'm not really sure how to get that data after it redirects and throws that error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要一个像 Mechanize 这样的工具。从它的描述来看:
这正是您所需要的。 ,
那么
你就准备好摇滚了。
You need a tool like Mechanize. From it's description:
which is exactly what you need. So,
then
and you're ready to rock 'n' roll.
该网站似乎正在使用会话执行一些重定向逻辑。如果您不发回他们在第一个请求中发送的会话 cookie,您将最终陷入重定向循环。恕我直言,这对他们来说是一个蹩脚的实施。
然而,我试图将cookies传回给他们,但我没有让它发挥作用,所以我不能完全确定这就是这里发生的一切。
The site seems to be doing some of the redirection logic with sessions. If you don't send back the session cookies they are sending on the first request you will end up in a redirect loop. IMHO it's a crappy implementation on their part.
However, I tried to pass the cookies back to them, but I didn't get it to work, so I can't be completely sure that that is all that's going on here.
虽然机械化是一个很棒的工具,但我更喜欢“烹饪”自己的东西。
如果您认真对待解析,可以看一下这段代码。它每天都会在国际范围内爬行数千个网站,据我研究和调整,没有更稳定的方法可以让您以后根据自己的需求进行高度定制。
While mechanize is a wonderful tool I prefer to "cook" my own thing.
If you are serious about parsing you can take a look at this code. It serves to crawl thousands of site on an international level everyday and as far as I have researched and tweaked there isn't a more stable approach to this that also allows you to highly customize later on your needs.